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Abstract 

We study the problem of detecting a change in the mean of one-dimensional Gaussian process data. 
This problem is investigated in the setting of increasing domain (customarily employed in time series 
analysis) and in the setting of fixed domain (typically arising in spatial data analysis). We propose a 
detection method based on the generalized likelihood ratio test (GLRT), and show that our method 
achieves nearly asymptotically optimal rate in the minimax sense, in both settings. The salient feature 
of the proposed method is that it exploits in an efficient way the data dependence captured by the 
Gaussian process covariance structure. When the covariance is not known, we propose the plug-in 
GLRT method and derive conditions under which the method remains asymptotically near optimal. By 
contrast, the standard CUSUM method, which does not account for the covariance structure, is shown to 
be asymptotically optimal only in the increasing domain. Our algorithms and accompanying theory are 
applicable to a wide variety of covariance structures, including the Matern class, the powered exponential 
class, and others. The plug-in GLRT method is shown to perform well for maximum likelihood estimators 
with a dense covariance matrix. 


1 Introduction 

Change point detection is the problem of detecting an abrupt change or changes arising in a sequence of 
observed samples. A common problem of this type involves detecting shifts in the mean of a temporal 
process. This problem has found a variety of applications in many fields, including audio analysis |20| . EEC 
segmentation [321, structural health monitoring |281I42| and environment sciences [351l56j . Despite advances 
in the development of algorithms |331l32l[331l48j and asymptotic theory [21 I351HM531 for a number of contexts, 
such studies are mainly confined to the setting of (conditionally) independently distributed data. Existing 
works on optimal detection of shifts in the mean in temporal data with statistically dependent observations 
are far less common. 

Incorporating dependence structures into the modelling of random processes is a natural approach. In 
fact, this has been considered in detecting changes of remotely collected data miiis]. For instance, Chan- 
dola et al. m proposed a Gaussian process based algorithm to identify changes in Normalized Difference 
Vegetation Index (NDVI) time series for a particular location in California. Despite such statistical mod¬ 
elling considerations, to our knowledge, most researchers have not exploited the dependence structures of 
the underlying temporal process, e.g., its covariance function and spectral density, in designing a (minimax) 
optimal detection method. 

In this paper, we focus on the detection of a single change in the mean of a Gaussian process data 
sequence. Our main contribution is to show that neglecting the dependence structures in the data samples 
leads to suboptimal detection procedures, particularly in the presence of strong correlation among samples. 
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Moreover, it is possible to exploit the underlying dependence structures to design asymptotically optimal 
detection algorithms. 

Consider a simplified setting in which we let G be a Gaussian process in a domain PCM and P„ := 
C ^ denote the finite index set of sampling points. Denote the observed samples by X = 
in which Xk = G (tk) for k = 1,..., n. Moreover, let t G Cn,a C {1,..., n} (the parameter a is a positive 
scalar which will be introduced in Section 12.31) and b > 0 represents the point of sudden change and the 
jump/shift value, respectively. Namely, there is /r G M (which will be assumed to be 0 for now) such that 

EXk = (^k-- 

To analyze the performance of a detection procedure as sample size n grows to infinity, one is confronted with 
two fundamentally different theoretical frameworks: increasing domain asymptotics and fixed domain (infill) 
asymptotics ( |46) . Chapter 13). The former arises naturally in time series analysis, which is distinguished 
by the constraint that the distance between consecutive sampling time points are bounded away from zero. 
The simplest instance of the sampling scenario in this regime arises when the diameter of P„ is of order 
n and min jti+i — ti\ > S for some strictly positive, fixed scalar <5. In our notation the index set for the 
Gaussian process represents the sampling time points. Typically we set P = M and (J^i = N or Z. See, 
for examples of works studying change point detection via increasing domain asymptotics. 

Fixed domain asymptotics, one the other hand, is a more suitable setting when the index set of sampling 
points P is bounded, so that the observations get denser in P as n increases. Particularly for P C M, 
we have that min — ti\ = O (k/n) for positive integers i, k with i, (i k) € {1,..., n}, and it can be 
extended to multidimensional domains in a straightforward way. This is the case for spatially distributed 
data [S3|, where the domain of the index set is typically of one, two or three dimensions. But there are also 
other examples: a particularly useful approach to change point detection in non-stationary processes widely 
adopted in speech processing and finance can also be cast in this framework. In fact, this approach involves 
piecewise locally stationary (PLS) processes, which can be interpreted as processes that are approximately 
piecewise stationary, due to the gradual and smooth change of the spectrum HHHi. Abrupt change detection 
in PLS processes has been considered in, for instance, mm- Since the future samples of a PLS process 
may not carry any information about the current state of the process, the increasing domain setting cannot 
suitably capture the dependence structure of the observed data sequence. As a result, the theory of change 
detection and inference in PLS processes is also based upon fixed domain asymptotics. 

Our goal in this work is to derive a computationally efficient detection procedure that effectively accounts 
for the underlying dependence structures of the observed sequential data. For the analysis of such a procedure 
we shall adopt both aforementioned asymptotic frameworks. Obviously, specific applications ultimately 
determine which one among the two is more suitable. There are also several motivations for this dual 
treatment. First, we are not aware of any established detection algorithm and accompanying asymptotic 
theory under the fixed-domain setting. Second, although there is a vast literature on the change point 
detection in the increasing domain scenario, existing work focus mostly on independent (or conditionally 
independent) data sequences, so our analysis in this setting still carries some notable novelty. Third, there is 
a fundamental difference in the behavior of the same detection algorithm when applied to the two asymptotic 
settings. This point is worth highlighting: we will show that in the hxed domain setting, ignoring the 
underlying dependence structure may result in suboptimal detection performance, but this is not the case for 
the increasing domain setting. Finally, we note that the fixed domain minimax detection problem considered 
in this paper also serves as a useful starting point in the study of optimal detection of discontinuities in 
Gaussian spatial processes, as considered recently by niisn]. 

The amount of correlation between the nearby samples is one of the differences among the two asymptotic 
frameworks that will plays a crucial role in our results. In the fixed domain regime, regardless of how large 
the sample size is, if \j — i\ is of order for some (3 £ (0,1), the correlation among Xi and Xj is still so 
close to one. Roughly speaking, the effective sample size is much smaller than n. However, in the increasing 
domain setting, even for long range dependent processes the correlation among samples at time points i,j 
is small when \j — i\ is large. Accordingly, the asymptotic behavior of a statistical method applied to the 


1 (fc < t) -I- ( /X -f - ) 1 (fc > t) , fc G {1,..., n} . 


( 1 . 1 ) 
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fixed-domain regime is expected to be distinct from that of the increasing domain regime. Moreover, new 
techniques are needed to address the statistical dependence intrinsic in the former. 

From here on, for fixed domain asymptotic results, we assume that G is a one dimensional Gaussian 
process in T) = [0,1] with n regularly spaced samples, i.e. = {k/nY^^^. Under the increasing domain 
asymptotics framework, we let T) = [0, oo) and = {1,..., n}. The remaining notations are the same for 
both. 


Previous works. One of the earliest attempts to study shift in mean detection was perhaps that of 
ChernofF et al. m- More general settings of this problem have been studied in subsequent works, e.g., 
nsmni^. For instance it is assumed in (SSI that the sequence of Xk’s are independent Gaussian variables. 
They proposed a detection method based on generalized likelihood ratio test (GLRT), also known as the 
cumulative sum (CUSUM) test, and given by 


1< 

max < 

f 

1 


[ V n 


n — t 


Y. 




/c=l 



( 1 . 2 ) 


CUSUM compares the maximum of a test statistic over Cn,a with a critical values i?„. Non-asymptotic 
upper bounds on the error probabilities of this simple test were obtained by the authors under the Gaussian 
and i.i.d. assumptions. Interestingly, due to test’s simplicity, when such assumptions do not hold, one can 
still apply the same test statistic to the data sequence. Most subsequent works appeared to follow this 
direction, in addition to adhering to the increasing domain asymptotic framework, e.g. [aUSlIlillT]. We 
wish to mention Rencova (|4^, chapter 4), who studied the same CUSUM test as [59], but working with 
the assumption that X be a strong mixing time series. Kokoszka |34| also analyzed the CUSUM test, but 
working with a different dependent observation model with sub-squared growth of the variance of partial 
sum, i.e. there is <5 G (0, 2) such that for any k < m, var ^ — k + l)"^. Horvath et al. pFI]26j and 

Antoch |S] studied the performance of the CUSUM test for the detection of a sudden change in the mean in 
linear processes, i.e. Xt = in which {et}’^_ao are i.i.d. and zero mean random variables and 

the weights satisfy some properties such as absolute or square summability. 

At first sight, it may seem puzzling how the CUSUM test continues to admit nearly optimal detection 
performance even as its test statistics apparently ignore the dependence among data samples. A high-level 
explanation can be made regarding this phenomenon: when G is a Gaussian process with U^i = N and 
the covariance function cov(A,j,Xt) —0 as |f — s| grows to inhnity, the percentage of pairs {Xg, Xt)^ 
whose covariance is non-negligible tends to zero as n —>■ oo. As a result, there is very little gain in account¬ 
ing for the dependence structures underlying the sequence, and so the CUSUM statistic provides a good 
approximation of the likelihood ratio test for large n, leading to the asymptotic optimality of Tcusum in 
the increasing domain setting. This is of course not the case for the fixed domain setting. Indeed, one of 
the contributions of this paper is to show that the CUSUM test is suboptimal when applied to the fixed 
domain setting of the detection problem. Moreover, to achieve mimimax optimal detection performance we 
shall develop a new test statistic that account for the underlying dependence structures in the data sequence. 
We also note in passing that in comparison to the increasing domain analysis typically encountered in the 
literature, the theoretical analysis for the fixed domain setting is considerably more challenging, as one needs 
to take into account the statistical dependence in the data sequence in a more fundamental way. 

CUSUM test also applied to one dimensional processes with highly correlated samples, after a proper 
standardization. For instance, Horvath et al. [26] used a different normalizing factor for applying CUSUM to 
one dimensional Gaussian time series with long range dependence. However apart from standardizing factor, 
they do not directly incorporate the correlation structures of the data in the formulation of the test statistic. 
Note that, the proposed test in this paper achieve consistency under weaker condition on minimal detectable 
jump. Furthermore, Lai m adjusts CUSUM test for detecting abrupt changes in the mean of a stochastic 
process. However we are not aware of the analysis of change point problem, for the case of unknown before 
and after distribution parameters, in the infill regimes and its comparison with the increasing domain setup. 
There are also some notable works (see e.g., |S5]) on estimating the volatility parameter of non-stationary 
time series using the change point models. 
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Overview of main results. In this paper we study the change point detection problem that arises in 
Gaussian processes in both settings of increasing and fixed domains. We show when it is important to 
account for the dependence structure in the data sequence, and analyze a number of detection algorithms 
based upon a generalized likelihood ratio test. More specifically, our contributions are as follows. 

1. Given an n-sample drawn from a one dimensional Gaussian process data with a known covariance 
structure, we propose a generalized likelihood ratio test for detecting a sudden shift in the mean. This 
method requires the knowledge of the dependence structure (via the covariance matrix), and will be 
shown to achieve asymptotically optimal detection performance in both fixed and increasing domain 
settings. Our theory holds for a variety of covariance structures, such as the Matern class, powered 
exponential class, and several others specified in terms of the covariance kernel’s spectral density. The 
smoothness parameter for the Gaussian process (which determines how fast the corresponding spectral 
density decays) plays a central role in characterizing the minimax optimal detection performance — 
but this is shown to be the case only in the fixed domain setting, not in the increasing domain setting. 

2. We provide an upper bound guarantee for the GUSUM detection method. This result confirms that 
the GUSUM is asymptotically optimal in the increasing domain setting, but it also suggests that the 
GUSUM is suboptimal in the fixed domain setting. The suboptimality is confirmed in our simulation 
study, which demonstrates a wide gap between the GUSUM and our method. This result makes sense, 
in light of the minimax result described in the preceding paragraph. 

3. In practice, the covariance structure is not known a priori, and often has to be estimated from the 
data. To address this scenario, we propose a plug-in GLRT method, and analyze its performance. In 
particular, we derive detection performance bounds which also account for the quality of a particular 
covariance estimation method (such as MLE based methods with dense or tapered covariances) used 
in the plug-in GLRT. Most surprisingly, we show that as long as a consistent covariance estimate is 
employed (the notion of consistency will be defined in Section H]), regardless of its estimation rate, the 
plug-in GLRT achieves asymptotically optimal detection performance. Moreover, in some situations a 
plug-in GLRT with an inconsistent covariance estimate is shown to perform almost as well as the case 
of known covariance. 

In addition to the above contributions, our proof methods contain several useful techniques. Our proofs 
integrate four major technical tools: we exploit properties of the mutually absolutely Gaussian measures, 
the decorrelation of samples drawn from Gaussian processes in fixed domain, the non-asymptotic analysis 
of the inverse of large Toeplitz matrices, in addition to the classical theory of minimax detection. We 
also develop novel techniques for analyzing different norms of the decorrelation matrix of X in either of 
the asymptotic regimes. The Appendices provide several beneficial and easy-to-reference technicalities for 
theoretical problems in the area of Gaussian random fields and time series that may be of independent 
interest. 

Structure of the paper. Section [5] precisely formulates detection of a shift in the mean, focusing on 
one dimensional Gaussian process data, and then introduces our proposed detection algorithm as hypothesis 
testing in the case of fully known spectral density of the underlying process. We also adapt our proposed 
detection technique, which will be referred as the plug-in test, to the much more realistic case of unknown 
spectral density. Note that Sections |31[S] and [S] are divided into two subsections focusing on theoretical results 
in the fixed domain and the increasing domain setting, respectively. Section [3] presents sufficient conditions 
on shift value b and spectral density to detect the existence of shift in mean with high probability. Section |4] 
is devoted to analyze the performance of plug-in test by imposing some sufficient conditions on the estimated 
spectral density. Section [5] serves as a comprehensive study of the GUSUM test. The minimax optimality 
of the proposed algorithms will be discussed in section [6l Section [7] is devoted to the numerical experiments 
and assessing the proposed algorithms using simulation studies. Section [5] contain concluding remarks with a 
concise discussion of future directions. Appendix contains the proofs of the main results and Appendix [Bl 
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presents and proves some auxiliary results used in Appendix]^ Lastly, Appendix[C] develops non-asymptotic 
results on the inverse of large Toeplitz matrices which are useful in the study of CUSUM test. 


Notation. A and V stand for minimum and maximum operators and the indicator function is represented 
by 1 (•)■ For any m € N, Im, Om and respectively denote the m by m identity matrix, all zeros column 
vector of length m, and all ones column vector of length m. For two matrices of the same size Mi and M 2 , 
{Ml, M 2 ) ■= j denotes their usual inner product. For any symmetric matrix M, Amin {M) 

represents the smallest eigenvalue of M. We will use the following matrix norms on M G For any 1 < 

/ \ i/p 

p < 00 , ||Af||^^ := stands for the element-wise £p norm of M, while ||Af||^^ := max^^ \Mij\ 

represents the sup norm of M. For a function / : 2? 1 —>■ R. and p > 0, \\f\\p ■= fp \f {u)f du. The special 

case of p = 00 is defined by ||/||oo := sup^gj, |/ (u)|. For any / G (R), / represents its Fourier transform 
defined by 

00 

f{oj)= J VwGR, 

— 00 

where = —1 denotes the imaginary unit. Moreover, for a symmetric function / G L°° ([— 7r,7r]), {/m}mGZ 
denotes the set of the its Fourier coefficients. Assuming that {/mlmGZ absolutely summable, 7 n (/) 
represents the infinite Toeplitz matrix generated by the Fourier coefficient of /, i.e. 


% (/) := 


/ /o fl /2 

/l /o fl 

/2 fl fo 

V • 


(1.3) 


/ 


Moreover, Tn if) = {TN{f))ii denotes n n x n truncated Toeplitz matrix generated by /. For two 

functions / and g on R, we write f {t) x g (t) as t —>■ to, if Ci < lim 
positive bounded scalars Ci < C2- In particular, we write f ft) ^ g (t) as t 


fit) 

ait) 


< C 2 for some strictly 

to to indicate the case that 
Cl = C 2 = Furthermore, for sequences a„ and we write = £2 (an) when bn is bounded below by 
Un asymptotically, i.e. lim \hn/an\ > C for some positive C. In case a„ and bn are random, = op (a„) 

n—>-oo 

means that bnjan converges in probability to zero as n —>■ 00 . For two sets 111,112 C R"*, dist(ni,n 2 ) •= 
infj^^gn^^ i=i ,2 ||wi — 0^211^2 stands for their mutual distance with respect to Euclidean distance. Lastly, F (•) 
denotes the gamma function. 


2 Problem formulation and detection algorithms 

In this section we present a formulation of the shift-in-mean detection problem associated with Gaussian 
process data, and then describe detection algorithms that account for the underlying process modelling 
assumptions. Recall that G is a Gaussian process in C R and {Xk = G {tk)}f)^i represents the set of n 
samples of G at times C T). At this point we proceed to split the problem formulation into 

two subsections according to the two different asymptotic settings. 

2.1 Gaussian process model in fixed domain setting 

In the fixed domain setting, G —EG is assumed to be a mean-zero stationary Gaussian process on a bounded 
domain 7) = [0,1] and regularly sampled at tk = k/n for fc = 1,..., n. Let the symmetric real functions K : 
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K I—>■ R and : K i—>■ K respectively denote the covariance function and spectral density of G. Accordingly, 

Tin ■= cov is a symmetric Toeplitz matrix given by 

n 

( 2 . 1 ) 

r,s=l 

In the next section, we impose some regularity conditions on K. 

2.2 Gaussian process model in increasing domain setting 

In the increasing domain setting, G — EG is a mean-zero Gaussian process in 2? = [0,oo) and so {Xk}/. 
endowed with a Toeplitz covariance function. This is a common setting for time series data, where the 
observed samples are indexed by time points in P. It is customary to assume that the covariance of the 
observed samples decreases as the temporal distance increases. Define cov {Xi, Xk) = fk for any k, in 
which {fm}m=o ^ absolutely summable sequence with /o = 1. Due to the stationarity assumption, 
Spj := cov is an infinite symmetric Toeplitz matrix. We view {Xk}^^^ as the observed part of an 

infinite stationary time series, {XkY^^i- Accordingly, the covariance matrix of {Xk}'^^^, denoted by E„, is 
a symmetric (truncated) Toeplitz matrix. 

It is a known fact that (Chapter 4, |21]) there is a symmetric and almost surely (with respect to Lehesgue 
measure) positive function, / : [—7r,7r] i—^ R such that = Tn (/). Thus E„ = Tn (/)■ For studying the 
asymptotic properties of the change detection algorithm, certain regularity conditions are required on /. 

2.3 Detection procedure based on generalized likelihood ratio test 

Now we proceed to formulate the detection of the existence of a sudden change in the mean of a one 
dimensional Gaussian process as a composite hypothesis testing problem. As noted above, we are dealing 
with two different settings of the domains. That is, we assume that G satisfies either of the two conditions: 

(a) Fixed domain setting: G is restricted to T> = [0,1] where its spectral density admits Assumption 13.11 
and Vn = 

(b) Increasing domain setting: The domain of G is P = R and the samples are taken at Vn = {1,..., n}. 
Moreover, Tn = Tn (/) for some / fulfilling Assumption 13.21 

Although the domain settings are different, the detection procedure that we propose based on a general¬ 
ized likelihood ratio test will be the same. The composite hypothesis testing problem is set out as follows. 
Under the null hypothesis, all the random variables have zero mean, i.e. EJt = 0„. To specify the alternative 
hypothesis Hi we first introduce a few additional notations. Let t G Cn,a denote the occurrence time of the 
single change point. The set Cn,a Q {1; ■ ■ ■ contains plausible occurrence time of the change, and we 
assume there is a £ (0,1/2) such that Cn,a = {t ■ t A {n — t) > an}. Another important parameter b denotes 
the amount of shift in the mean before and after the change point. Thus, for a fixed t € Cn, the associated 
alternative hypothesis to t can be stated as, 

: 3b^0,EX= (2.2) 

where Q £ R" is given by Q (k) '■= sign (fc — t) for any t £ C„_q,. Since t is not known a priori, the alternative 
hypothesis is specified by taking the union of Hi f Thus, the composite hypothesis testing problem is given 

by 

Hq : EX = 0„, v.s. Hi = Hi_t, i.e. 3 t £ Cn,a, b ^ 0, s.t. EX = —(f (2-3) 

tGC„,„ ’ ’ ^ 

Next, we propose a test statistic which is constructed by the generalized likelihood ratio (GLR). Note that 
the GLR is an explicit function of the joint density of samples and so the Gaussian process assumption is 
essential to its calculation. 


Tn = {covXrXJ/^^j = 


K 


r — s 
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Proposition 2.1. Assuming that is known, there exists Rn s > 0 for which the GLRT is given by 


/ 


Tglrt =I 


max 

t6C„ „ 


V 


c7 



(2.4) 


The threshold value Rn,s depends only on n and some parameter S determining the false alarm rate. 
The precise form of Rn,s will be presented in subsequent sections. We also note that setting p, = 0 in (11.11) 
results in a substantially simplified expression of the GLR, which eases the exposition of our analysis of the 
computational and theoretical properties of the proposed test. The general form of the GLRT, when /i is 
unknown, is presented as Proposition lA.ll in Appendix lAl 

Unlike the CUSUM test, cf. Eq. (I1.2|l . the covariance function of G is explicitly taken into account in the 
GLRT. As a result, it will be shown in the sequel that the proposed detection method is optimal, while the 
same cannot be said for the CUSUM test, specifically in the setting of fixed domain asymptotics. In practice, 
however, the covariance is not known and needs to be estimated. To address such scenarios, we propose to 
approximate the likelihood ratio by plugging in a positive definite estimate of the covariance matrix, which 
will be indicated by E„, in Eq. The plug-in detection technique will be called plug-in GLRT. Here 

is the formulation of the plug-in GLRT, while the choice of E„ and the accompanying theory will be given 
later in Section 0] 

Definition 2.1. Let be a positive definite estimate of E„. The plug-in GLRT is given by 


/ 


Tglrt = I 


max 


V 



for some strictly positive threshold value Rn,5- 


(2.5) 


3 Detection rate of GLRT: known 

This section is devoted to a treatment of the detection rate of the GLRT, given that is known. Section 
o addresses this problem in the fixed domain, while Section 13.21 considers the increasing domain. We first 
define the risk measure that we will consider in the subsequent sections. 

Definition 3.1. For any change detection algorithm T G {0,1}, the conditional detection error probability 
(GDEP) of T which is denoted by ipn(T), is defined as 

ip^(T) = F{T=l\ Ho) -k ^max P (T = 0 | Hi,*). 

Remark 3.1. In words, (pn is the sum of the false positive rate and the maximal miss detection rate taken 
over the set of possible change point locations Cn,cf Clearly, GDEP hinges on the choices of C„,q - the value 
of ipn increases as C„,q, becomes a larger proper subset of {1,... ,n}. GDEP as a risk measure has been 
considered in the literature for detecting abnormal clusters in a network (see e.g., laiio]). It also provides 
an upper bound on the Bayesian risk measure. We refer the reader to [2] for a prudent comparison of GDEP 
and the Bayesian risk measure. 

In the following results, we seek for sufficient conditions on the shift value b under which the GDEP of 
GLRT is bounded above by some small 6 G (0,1). 
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3.1 Detection rate in fixed domain setting 

The results in this sub-section are guaranteed for two common classes of covariance function, one of which 
admits polynomially decaying spectral density, and the other is Gaussian covariance function. Throughout 
this section we assume that G is a one dimensional Gaussian process restricted ioV = [0,1] whose covariance 
function and spectral density are denoted by K and K, respectively. We first focus on the case of polynomially 
decaying K. 

Assumption 3.1. A is an integrable positive definite covariance function. Moreover, there exist v G (0, oo) 
and Ck (depending on A) for which A satisfies the following condition: 


Ck '■= sup 

cjSR 


A(w)(l+a;2) 


u+i 


< 00 . 


(3.1) 


We shall always choose the largest possible v that satisfies (13.11) . It is simple to see that Assumption 13. II 
holds if and only if A is bounded at the origin and A (w) x as w tends to infinity. It is well-known 

that the tail behavior of A is closely linked to the smoothness of A at the origin (e.g., Section 2.8, |53p. 
The following are a few examples of common covariance functions that will be studied in this paper. 


(a) Matern: 
density. 


This class is widely used in geostatistics, and has a fairly simple explicit form of spectral 


A (a;) 





-(^+ 112 ) 


(3.2) 


in which p,i',a G (0, oo). Regardless of the choice of p and cr, condition p.ll) holds for Matern spectral 
density with parameter ly. 


(b) Powered exponential: Another versatile class of covariance functions is 


A (r) = cr^ exp 



(3.3) 


for some /3 G (0,2) and p,a G (0,oo). Although the spectral density does not have a closed form in 
terms of simple functions. Lemma [8.31 shows that A admits Assumption 13.11 with i/ = (3/2. 

(c) Rational spectral densities: Rational spectral densities form a general class admitting Assumption 13 .1 1 
For any A in this class, there are two polynomials, Qn and Qd, with real coefficients, unit leading 
coefficients and p := deg (Qd) — deg (Qn) & N, such that 


K{uj) = X 


\Qu (ja^)l^ 
\Qd (jw)l^ 


(3.4) 


Moreover, we assume that Qd has no root on the imaginary axis and A is a strictly positive scalar. Since 
A (0) < oo and A (w) x as w —>■ oo. Assumption 13.11 holds with v = p — 1/2. 

(d) Triangular: For T^a G (0,oo), the covariance function and spectral density are given by 


A (r) = ( 1 — 




Sine 


(? 


Triangular covariance is less favorable than the aforestated cases due to the oscillatory behaviour of A 
(p. 31, [53]). One can easily show that this covariance fulfils Assumption 13.11 with v = 1/2. 












Theorem 3.1. Let S G (0, 1). Suppose that G is a one dimensional Gaussian process restricted toV = [0, 1] 
whose associated spectral density K admits Assumption 13.II for some v and Ck- G is regularly sampled on 
i/n, i = 1,... ,n. There exist Rn,s > 0 (depending only on n and 5), ng := ng {K) and a positive universal 
constant C such that if n > ng and 

\b\ > Cn-’'^CK (^1 + log (3-5) 

we have 

V’n {TgLRt) < S. 

See Appendix IA.2I for the proof of Theorem 13.11 We now make several comments regarding the roles of 
various quantities embedded in Theorem 13.II 


(a) Rn,s in Theorem 13.II can be chosen as 


Rn,s — 1 + 2 


log 


2n (1 - 2a) 


/log 


2n(l - 2a) 


(3.6) 


We guarantee that CDEP is less than or equal 6 by controlling false alarm and miss detection probabil¬ 
ities below S/2. Our trick provides an evidence to choose Rn,s- Notice that under null hypothesis, the 
test statistic in Eq. (12.41) has the same distribution as the supremum of a Xi process over C„,a, which 
is represented by {dr (t) : t G Cn,a}- Strictly speaking for controlling the false alarm probability below 
6/2, Rn^s needs to be chosen in such a way that 

'S (t) > Rn,S^ < 

The standard Xi tnil inequality in [7] implies that if Rn.s is chosen based upon Eq. (13.6|) . then (t) is 
below 6/ {2n (1 — 2a)} for any t G Cn,a- Thus, the union bound inequality yields 

P 


sup 
t6C„ ,< 


^ (t) > Rn.S < \Cn,a\ EUax P (4' {t) > Rn,s) < 


s\c„ 


teCn 


2n (1 - 2a) 


P ( max 

\teCn,o 


(b) The minimal detectable shift is proportional to y/C k, as defined in (EH). Note that Ck is determined by 
both low frequency and tail behaviour of spectral density via v. (E.g., for Matern covariance functions 

given by (|3.2I) . Ck = (i v p-'^'^^). It is easily verifiable by (|3.1I) that Ck is linearly 

proportional to xCCjO), meaning that Ck also captures the notion of the standard deviation of the 
observations. Thus, Theorem 13.11 implicitly expresses that change detection is more challenging for 
Gaussian processes with larger variance. 

(c) Sample size n has two opposing effects on the detection rate. On the one hand, n (1 — 2a) appearing in 
the logarithmic function, is closely connected to the size of alternative hypothesis which is determined 
by \Cn,a\ = n{l — 2a). On the other hand, the term n~'' indicates the possibility of small shift detection 
as more observations are available. 


We will see in Section 13.21 that parameters 6, variance of observations and sample size have almost 
analogous roles in the increasing domain change detection. The main difference between the two asymptotic 
settings is the role of the decay rate of K in fixed domain, which is encapsulated as ly. Note that r is closely 
related to the smoothness of G with larger values of ly corresponds to smoother Gaussian process in the mean 
squared sense (cf. [53], Chapter 2). For smooth Gaussian processes, G(tg) can be interpolated using the 
observations in the vicinity of tg with small estimation error. This leads to a simpler shift-in-mean detection 
for smoother processes. More precisely, as n —t c» the lower bound on detectable b, EH: vanishes more 
rapidly for larger ly. 
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Remark 3.2. We describe the rate of the minimal detectable jump for some specific commonly used classes 
of spectral densities, all of which satisfy Assumption 13.II 


(a) Matern: For the Matern class with parameters {a,p^v) satisfying Assumption 13.11 the smallest de¬ 
tectable shift in mean is |6| x n~'^\J\og (n (1 — 2a) /6). 

(b) Powered exponential: For the power exponential class, any jump size of magnitude at least |6| = 


n log in (1 — 2a) 


is detectable. On the contrary to Matern class, obtaining a closed form 
for Ck is quite difficult for powered exponential class. 

(c) Rational spectral densities: It has been discussed previously that K (w) x |a;| as w —>■ cxd and 
u = p — 1/2, revealing that each |&| = O •yiog7rr(W^'2a)7^^ can be detected with high 

probability. 


(d) Triangular: Since K satisfies Assumption 13.11 with z/ = 1/2, so any |6| = O ( \Jn~^ log (n (1 — 2a 


is detectable. Although it needs a great algebraic effort to find Ck-, it can be shown easily that 
Ck Cl O' (2//3 -f p/2). 

We conclude this section by a comprehensive explanation the role of a in Theorem 13.11 The dependence 
on Of in Eq. (|3.5I) is logarithmic, which encoding how the size of C„,a affecting the detection rate. 


Remark 3.3. The minor role of a in Eq. (13.51) may seem a bit surprising. Strictly speaking the asymptotic 
behavior of the smallest detectable jump remain the same, regardless of how small a has been chosen (even 
if a tends ro zero). It is also notable to mention that we did not use the assumption that a is a fixed and 
strictly positive scalar in our proof. This puzzling aspect can be resolved by a deeper look at the formulation 
of the hypothesis testing problem (12.311 . For algebraic convenience, we assume that the mean of G fluctuates 
around p = 0 in Eq. (|2.3I) . The fact that we assumed p is known is the main reason that a parameter in the 
detraction rate of GLRT, as we do not need to estimate /i from the data. That is why in this particular case a 
even be chosen as small as O (1/n). We want to emphasize that the generic form of GLRT test for unknown 
p are presented in Proposition lA.il We believe that in the analysis of the extended version of GLRT, the 
constant C in Eq. (|3.5|) depends on a (without changing the dependence on n and other parameters). 


Gaussian covariance function. The Gaussian covariance function is given by 


K (r) = exp 



K (w) = pcr^-\/^exp 


2 


(3.7) 


It is widely used in practice for modeling of smooth Gaussian processes, e.g. in [40]. Regarding this choice 
of covariance, we have the following result: 

Theorem 3.2. Let G be a Gaussian process on [0,1] which is observed at i/n, i = 1,... ,n, whose covariance 
function is given by Eq. (1X71) . Ghoose S S (0,1). There are Rn^s > 0, no := no (p), Co := Co (p) > 0 and a 
universal constant G > 0, such that if n > ng and 


|5|>GWexp — nlog(Gon) log 


:(1 - 2a) 


(3.8) 


then 

Tn (Tglrt) C S. 

The details of the proof is given in Appendix IA.2I Because of the super-exponential decay of Gaussian 
spectral density. Assumption 13.11 is actually satisfied for any z/ > 0. This result shows that it is possible to 
detect exponentially small jump size 6 as n increases. Moreover, the result of Theorem l3.2l is also compatible 
with that of Theorem 13.11 in quantifying precisely the assertion that the smoother the Gaussian process is, 
the easier it is to detect the presence of the shift in the mean. 
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3.2 Detection rate in increasing domain setting 

Turning now to the increasing domain setting, recall that G is assumed to follow the setting described in 
Section o We also assume that = 7^ (/) for some function / satisfying the following conditions. 

Assumption 3.2. / ; [—tt, tt] i—>■ K. is a real symmetric function such that 

(a) There are two positive universal scalars, 0<m/<M/<oo such that 

rrif := inf /(w)<M/:= sup / (w) • 

(b) There exist positive universal constants c and A such that 

|/fe|<c(l + fc)-(^+^^ (3.9) 

Note that the first condition regarding the iiifimum of / is necessary to have a positive definite infinite 
covariance matrix, i.e., > 0 for any non-zero i/ G Moreover, the polynomial decay of /fc’s as 

stated in (13.9|) is a sufficient condition to ensure that / can be equivalently expressed by its Fourier series. 
Such condition is common in the non-asymptotic analysis of Toeplitz matrices (see, e.g., |22|1. 

Theorem 3.3. Let S G (0,1) and suppose that Y,n = Tn (/) in which / admits Assumption 13.21 for some 
positive scalars c and A. There exist no S N, C > 0 (depending only on c and A) and Rn^s > 0 such that for 
any n > no, if 

|6| > C^/(0)n-Mog(^ ^^^~^^^ ^, (3.10) 

then 

(TglRt) < S. 

See Appendix I A. 2 1 for the proof of Theorem 13.31 Some comments are in order. First, the threshold Rn,s 
in Theorem 13.31 is chosen in exactly the same way as in the fixed domain setting, as given by Eq. (1^ . 
Second, in contrast to the fixed domain setting, the dependence structure for G no longer plays the central 
role in the characterization of the detection performance. In particular, / (0) is the only factor in (13.101) 
that captures the correlation in the samples, but this scalar quantity evidently has an insignificant effect: 
the asymptotic behaviour of GLRT remains the same (up to some constant factor) for different Gaussian 
processes satisfying Assumption l3.2l A related observation that arises by comparing between (13.51) and (13.101) 
is that the correlation structure of observations, which is encapsulated into v or /(O), and the quantities 
encoding the marginal density information such as n have been completely decoupled in the rate of GLRT in 
increasing domain. An examination of the proof reveals that the decoupling effect in the increasing domain 
setting arises due to the short-range correlation assumption (cov(Ar,As) —?> 0 polynomially in |r — s|). It 
follows that as n increases the correlation for most pairs of observed sample become negligible. 

Remark 3.4. Although for algebraic convenience, throughout the paper we assume that a is a fixed and 
positive constant, it is not necessary for our proof technique of Theorem 13.31 We basically can generalize 
Theorem 13.31 to the case that a tends to zero as n —>■ oo. A deeper look at our proof (see IA.2I) and the 
auxiliary results in Appendix [C] reveals that Eq. (13.101) can be replaced by a complicated from 


|6| > G^log(^^^^iy^x//(0)-e„, 

In which is a vanishing sequence of n, which can be easily eliminated from the detection rate by adjusting 
the universal constant G. We refer the reader to Remark l3.3l for a detailed discussion on the role of a in the 
case that ^ is unknown in the formulation of the shift in mean problem 1 (11.11) 1. 
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4 Detection rate of plug-in GLRT 

As we have shown in Proposition 12.11 full knowledge of S„ is central to computing the generalized likelihood 
ratio. In practice, the spectral density and covariance function of G are not known a priori, and so we take 
a plug-in approach, by approximating the GLRT by estimating the covariance estimate (see Definition 
HU)- This section serves to investigate various ways of constructing plug-in GLRT and assessing its detection 
performance. We focus only on the fixed domain setting, because the dependence structure underlying the 
hypothesis plays an important role in determining the detection error rate, as shown in Section [3.21 

We first assume that G is a Matern Gaussian process onl) — [/, oo] with unknown parameters rj = (cr, p) G 
D (see Eq. (ES), and is regularly observed on to indicate the estimated 

parameters using m regularly spaced samples in T). We also assume that Cn,a C {k : an < fc < (1 — a) n}. 
Namely, the Gaussian process is under control for a certain number of observations. The controlled samples 
before the sudden change, Wg := {Xk : k < an}, will be used to estimate rj. The parameter estimation 
stage is typically called the burn-in period in the literature. 

It is known in the Gaussian processes literature that rj is not consistently estimable in the fixed domain 
setting when the number of the observations in D grows to infinity — see EHIISII for further details. Zhang 
m showed that neither tr or p are consistently estimable but the quantity ap can be consistently estimated 
using MLE. The profound reason behind the inconsistency is the existence of a class of mutually absolutely 
continuous models for G which are almost surely impossible to discern by observing one realization of G. 
Strictly speaking, the induced measures corresponding to two Matern Gaussian processes with parameters t] 
and p' are absolutely continuous with respect to each other, whenever ap~'^ = a'p'~'^. Furthermore Zhang 
m showed that if one fixes p at an arbitrary value, then the maximum likelihood estimator for ap is 
consistent. We shall show that despite the inconsistency in estimating p, plug-in GLRT exhibits an analogous 
performance as GLRT with fully known covariance function whenever the estimate of p is consistent up to 
the equivalence class. 

It has been discussed in that fixing pm at large values has trifling impact on predictive performance. Note 
that due to the complicated dependence of Matern covariance function to p, estimating p is a computationally 
challenging task, particularly for large data sets. So we can accelerate the whole detection procedure without 
estimating p. Plug-in GLRT change detector is a two stage algorithm as follows: 

• Estimation step: 

1. Fix prn at the largest possible element in D. Namely, pm is a deterministic quantity given by p™ = 
sup{p : (ct,p) e D}. 

2. Estimate ap~'^ given the controlled samples JAg, using any consistent procedure such as maximum 
likelihood (MLE) [HI], weighted local Whittle likelihood [55], averaging quadratic variation |5|. We 
use the term consistent to refer to the cases that |crp“'^ — ampi^\ 0 as m grows to infinity. 

3. Gonstruct the approximated covariance matrix of X, as = [K ^m)](here m := [anj). 

• Detection step: 

1. Applying the GLRT by plugging in place of into (12.41) . as described in Definition lO 

Now we turn to state the main result of this section regarding the rate of the plug-in GLRT. 

Theorem 4.1. Let S G (0,1). Let G be Gaussian process whose associated spectral density K has Matern 
form with unknown parameters (cr, p) G D. Given regular samples of one realization of G, there are finite 
scalar G, no G N, a non-negative sequence lim Tm = 0, and threshold level Rn,s > 0 such that for any 
n > no, 

{Tglrt^ < (5 -I- 2rm, 

whenever 
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\b\>Cn-% Ck {l + -]log 


,(1 - 2a) 


See Appendix IA.3I for the proof of Theorem 14.11 

Remark 4.1. The threshold value for the plug-in GLRT is chosen exactly same as in Theorem 13.II 


(4.1) 


^n,S 


1 + 2 log 


2n(l - 2a) 


/log 


2n(l - 2a) 


(4.2) 


Since is a vanishing sequence and m = [anj is an increasing function of n, {5 + 2Tm) lies in 

the vicinity of 5 for large n. It is worthwhile to mention that C appearing in Theorem 14.11 is larger than 
the previously introduced scalar in Theorem 13.11 which can be viewed as the cost of mis-specifying p. The 
most interesting aspect of Theorem 14.II is perhaps that if some consistent estimate of <jp~'^ is available then 
regardless of its rate, the plug-in GLRT has asymptotically the same rate as the GLRT with fully known 
covariance function. 

We conclude this section by studying the performance of plug-in GLRT when both variance and range 
parameter are consistently estimable. In this case the plug-in GLRT test is constructed by replacing the 
estimated parameters in the GLRT test statistic. Suppose that G has a powered exponential covariance 
function, introduced in Eq. (|3.3I1 . Anderes [3] proposed a consistent estimate of covariance parameters using 
empirical average of the quadratic variation of G. According to Theorem 5 of [3], unlike the Matern class, 

both (To and po are consistently estimable when /3 G (0,1/2). Namely, |p —Pm| V |cr —dml —t 0, for the 
method introduced in [3]. The following result, which has a similar flavor as Theorem 14.11 determines the 
detection rate of plug-in GLRT for one dimensional powered exponential Gaussian processes. 

Theorem 4.2. Let 5 G (0,1). Let G be Gaussian with powered exponential covariance function with 
unknown parameters (cr, p) G fl and known /3 G (0,1/2). Given regular samples of one realization of G, there 
are finite scalar no G N and G (which depends on the covariance parameters j3,a and p), a non-negative 
sequence lim Tm = 0, such that for any n > ng, 

m—^co 





< ^ -|- 2Tjn, 


whenever 


and 


l&l > G^n-/5log(^^il^). 


f^n.6 — 




Theorem 14.21 states that given a consistent estimate of p = ((To,po), the plug-in GLRT procedure has 
the same asymptotic behavior as GLRT with fully known parameters (see part (6) of Remark 13.21 for the 
detection rate of GLRT with known ctq and po). 


5 Detection rate of CUSUM 

In this section we revisit the classical GUSUM test and present several results regarding its detection rate in 
both asymptotic settings. These results should be contrasted with our earlier theorems on the performance 
of the proposed exact and plug-in GLRT tests, and highlight the need for accounting for the dependence 
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structures underlying the data, especially in the fixed domain setting. In the following, Theorem [5TT] intro¬ 
duces sufficient condition on |6| under which CUSUM can distinguish null and alternative hypotheses with 
high probability. Theorem 15.21 studies the performance of CUSUM in the increasing domain setting. 


Theorem 5.1. Let G be a Gaussian process in [0,1] satisfying ||iL||j < oo and 



< 00 . Moreover let 


S G (0,1), a G (0,1/2) and Cn,a = [an, (1 — a)n\. Given n samples of one realization of G at i/n, i = 1,..., n, 
there are Rn,s > 0, and uq := uq (i 5, a) such that if n > no and 


|6| > 4^ 


los(2=^) 

a{l — a) ’ 


(5.1) 


then, 

‘^n [TcUSUm) < S. 

We refer the reader to Appendix 14.11 for the proof of above result. The risk of fixed domain-CUSUM 
has been controlled from above under mild conditions on K, which holds true for all considered examples 
of covariance functions in this paper. It is indeed obvious form the following inequality that K satisfies the 
assumptions in Theorem 15.II if a (r) := rK (r) is absolutely integrable: 


k' 


= sup 

wGR 


a (r) e-^‘^^dr 


OO 

< J \rK (r) I dr. 


The main feature of the above theorem is the sufficient condition that the jump size increases (at the order of 
log n at least) in order to have an upper bound guarantee on the detection error. Although we do not have a 
proof that this sufficient condition is also necessary, our result suggests that the CUSUM test is inconsistent 
in the fixed domain setting: the detection error may not vanish as data sample size increases, when the jump 
size is a constant. This statement is in fact verified by simulations. By contrast, we have shown earlier that 
using the GLRT based approach, we can guarantee the detection error to vanish as long as the jump size is 
either constant or (better yet) bounded from below by a suitable vanishing term. 

Remark 5.1. Let us give a qualitative argument for the inconsistency of the CUSUM test in the fixed 
domain setting. Suppose that b tends to zero as n ^ oo. Define 


Ut 


t(n — t) 




k—t+l 


1 ‘ 


The expected value of is zero, under the null hypothesis and for any t. Regardless of the existence of a 
shift in the mean, the standard deviation of Ut remains the same. A careful look at the proof of Theorem 
o reveals that the smallest value of the standard deviation of Ut over t G Cn,a is order y/n. Moreover, if 
there is a shift in the mean occuring at the change point t G Cn.on then the expected value of Ut is given by 
bVk n — t) /n = O (bkn) (Recall that an < t < (1 — a) n). Generally speaking, as the mean of Ut under 
the null hypothesis, denoted by E(17t | Hq), is zero for any t G Cn,a, the GUSUM test cannot distinguish 
between the null and the alternative (even for large sample size), since 


kyar {Ut) \ ) 


O (b) —>-0, as n oo. 


Here, E (14 | Hi) represents the expected value of Ut under the alternative. This suggests that regardless 
of the sample size, the GUSUM test cannot detect the existence of a small shift in the mean in the fixed 
domain setting. 
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Remark 5,2. The threshold value of the CUSUM test in Theorem 15.II is given by 


Rn,5 — 


\ 


n 1 + 2 log I 


This threshold has different form of dependence to n than that of the threshold of GLRT in Eq. (13.61) , since 
unlike GLRT the GUSUM test do not reduce the correlation among the samples. In order to remove the 
gap between the threshold of GLRT and GUSUM in the fixed domain setting, we further normalize Ut by 
considering U* = Un/^/n. So equivalently CUSUM test in this regime can be written as 


Tcusum — 1 


max 

t6C„ „ 


\u*\ > Ks ■■= 



Here R* ^ is exactly same as the critical value of GLRT test. 

Now we aim to study CUSUM test in the increasing domain asymptotic regime. Recall that for this 
scenario, G is a Gaussian process with covariance matrix = Tn (/) 

Theorem 5.2. Let S £ (0,1), ■(? > 0 and Cn,a = [an, (1 — a) n]. Assume that / satisfies Assumption 13.21 for 
some c and A. There are no = ng (/, and universal constant C (A, c) > 0, such that if n > no and 


|i>|>C 


na (1 — a) \ d J 


(5.2) 


then 

{Tcusum) < S. 

Th reader can find the detailed proof of Theorem l5.2l iii Appendix I A. 41 By comparing between Theorems 
13.31 and 15.21 it is clear that the CUSUM test exhibits a similar detection performance as the GLRT test in 
the increasing domain setting. In fact, we will show in the next section that both tests achieve minimax 
optimality in that setting. However, according to the numerical studies, using GLRT slightly improve the 
detection performance comparing to CUSUM, especially in the presence of strong long range dependence. 
Thus, in practice, one can afford to ignore the covariance structure of G in the increasing domain setting, 
and due to its simplicity, CUSUM is to be preferred. 

Remark 5.3. The parameter a, which characterizes the prior knowledge on the location of possible abrupt 
change, plays a similar role in both Theorems l5.1l and l5.2l The fact that a is a fixed, strictly positive quantity, 
means that the detectable jump for CUSUSM in the increasing domain setting is of order \Jn~^ logn. 
However, such restriction on a is not critical for our proof approach and the CDEP of CUSUM is still less 
than (5, if condition (15.2|) holds for the case of a —>■ 0 with the sample size growing. For instance even if 
a X for some /3 £ (0,1), CUSUSM is still consistent with the detection rate |&| > \/n~P log n. 


6 Minimax lower bound on detection rate 

In this section, we establish minimax lower bounds on the detectable jump in the mean of G. Theorem 16.II 
shows that the obtained rate for GLRT and plug-in GLRT (Theorems 13.11 and 14.11) are nearly optimal (up 
to some logarithmic term in n) for rational spectral densities. Section 16.21 demonstrates the near minimax 
optimality of the GLRT and CUSUM algorithms in increasing domain setting. Before jumping to the main 
result of this section, let us rigorously introduce the notion of near minimax optimality. 

Definition 6.1. Given n samples, let T £ {0,1} be a shift in mean detection algorithm whose CDEP 
is denoted by ipn{T). T is said to be near minimax optimal (up to some logarithmic term inn) in the 
asymptotic sense, if for any 8 £ (0, 2), there are two vanishing sequences {hi^nY^^i and relying on 

n,8 and spectral density, such that 
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1. ipn (T) < S whenever |6| > 

2. When |6| < there is no algorithm whose CDEP is strictly less than 6. 

3. There exists a positive, bounded scalar /3, for which hi_„//i 2 ,n = O ^log^ as n —>■ oo. 

Put simply, when the sample size is large enough, no algorithm is considerably superior to a near optimal 
T, regardless of how complicated its formulation might be. 


6.1 Lower bound in the fixed domain 


We begin this section by recalling that in the fixed domain regime, G is a Gaussian process in [0,1] which is 
observed at {i/n}^^^. We formally introduce the class of spectral densities that we consider in this section. 
While the following conditions on K are more restrictive than Assumption 13.11 it is still provides a rich class 
of commonly used spectral densities. 

Assumption 6.1. There are constants p S N and (3 G (1/2, c») such that 
1. lim K (w) exists and G^ := lim K {uj) G (0,oo). 


2 . 


lim sup 





< oo. 


Generally speaking. Assumption 16. 1 1 contains the class of spectral densities K (oj) for which there is some 
p € N such that K {uj) x as w tends to infinity. Note that the second condition in Assumption 

IQ is of theoretical purposes and does not have a simple qualitative interpretation. It can be observed 
that Assumption 16.11 excludes any K {uj) satisfying Assumption 13.11 with (jz+1/2) ^ N. For instance, 
Assumptio d6.1l does not hold for Matern covariance functions with (iz + 1/2) ^ N. 

Remark 6.1. Here, we name a salient class of spectral densities satisfying Assumption 16.11 

• Simple calculations show that any rational spectral density K (See (13.41) 1 admits Assumption 16.11 with 
G)f = A, /3 = 1 and p = deg(Qd) — deg(Qn) G N. Moreover, K satisfies Assumption 13.11 with 
V = p — 1/2. We discussed in Remark that Matern covariance function with p := (^ + 1/2) G N 
has indeed a rational spectral density. These particular instances of Matern covariance, which are 
commonly used in machine learning and geostatistics, are of the form K (r) = Q (|r|) where 

Q (■) is a polynomial of degree p — 1. 


Theorem 6.1. Let S G (0,2) and assume that Assumption 16.11 holds for AT. Consider the problem (12.3|1 in 
which cov(X) = [K There are positive scalars Ck and uq := uq (K) such that if n > no and 


|4| < \/i«e(jp^). (6-1) 

then for any test T, 

(T) > 6. 

See Appendix 1A.51 for the proof of Theorem 16.11 

Remark 6.2. Comparing the detection rate of GLRT (see Theorem 13.11) and plug-in GLRT (recall from 
Theorem ED), with the rate described in Eq. (16.11) establishes near minimax optimality of the GLRT with 
known covariance structure and plug-in GLRT for the class of spectral densities considered in Remark ] 6. 11 
in the asymptotic sense. Strictly speaking, under the fixed domain setting, there is a gap of order y/Togn 
between dSI]) and the detection rate of formerly studied GLRT based algorithms. Although we do not have 
a proof to establish the near minimax optimality of the GLRT and plug-in GLRT for spectral densities 
satisfying Assumption 13.11 our conjecture is that Theorem l6.11 can be extended to this broader class. 
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6.2 Lower bound in the increasing domain setting 

Turning to the increasing domain setting, we give a condition on jump size |6| according to which no algorithm 
in the increasing domain can properly detect the existence of a shift in the mean. Unlike Section l6.ll there 
is no distinction between the assumptions used to obtain the minimax lower bound and Assumption 13.21 

Theorem 6.2. Let 6 G (0,2), r? > 0 and Cn,a = [an,{l — a)n]. Suppose that = Tn{f) in which / 
satisfies Assumption 13.21 There are no := no (/, d) and a universal constant C > 0 such that if n > no and 



(l + d)/(0 ) log(^^) 
an 


then for any test T, 

‘Pn (T) > 6. 

The direct comparison between the detection rate of both CUSUM (in Theorem 15.21) and GLRT (see 
Theorem 13.31) test with the above result indicates the minimax optimality (up to some order log n term) of 
both os these procedures in the increasing domain setting. 

7 Simulation study 

To illustrate the performance of the proposed shift-in-mean detection algorithms, we conduct a set of con¬ 
trolled simulation studies for verifying the results in Sections |310] and [SI Our goals are two-fold: 

(a) comparing the performance of the GLRT based algorithms with the standard GUSUM test. 

(b) assessing the sensitivity of algorithm (12.51) to the parameters of the covariance function and tapering of 
Un. 

In all the numerical studies in this section we fix n = 500 and a = 0.1. 

The area under the receiver operating characteristic (ROC) curve, which will be referred as AUG, is a 
standard way for assessing the performance of a test. The ROC curve plots the power against the false alarm 
probability. Since the ROC curve is confined in the unit square, the AUG ranges in [0,1]. The ROC curve of 
a test based on pure random guessing is the diagonal line between origin and (1,1) and so the AUG of any 
realistic test is at least 0.5. 

The subsequent figures in this section exhibit empirical AUC versus b. For a fixed value of b, covariance 
function K and a detection algorithm T, we apply the following method to compute the AUC of T: 

1. Set Ti = 500 and T 2 = 50. 

2. For fc = 1 to T 2 repeat independently 

(a) For £ = 1 to Ti repeat independently 

i. Choose p G {0,1} with equal probability which denotes null or alternative hypotheses. Thus, 
approximately T'i/2 = 250 experiments correspond to both null and alternative. 

ii. If p = 0, generate zero mean X G K" according to covariance function K. That is, X are sampled 
from a Gaussian process with no abrupt shift in mean. Otherwise, choose t G [an, (1 — a) n] = 
{50, 51, • • • , 450} uniformly at random (recall that t represents the location of the mean shift) 
and generate X G M" according to Hi ^ 4 . 

iii. Compute T score. 

(b) Numerically obtain the ROC curve of T based upon Ti experiments in part i. 

(c) Given the ROC curve, compute AUCk using trapezoidal integration method. 
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3. Compute the average AUC by AUC = ^ X) AUCk- 

k=i 

The first simulation study aims to compare CUSUM and GLRT based algorithms in the fixed domain 
regime and assess the role of smoothness and other parameters of K in the performance of the GLRT. For this 
experiment G is a Gaussian process in [0,1] which is observed at regularly spaced samples, = {k/n}^^^, 
i.e., Xk = G{k/n), k = . The covariance function of G is assumed to has Matern form with 

parameters {aQ,po,v). Strictly speaking, 


cov {Xi,Xi) = alK^ 



i,l = l,...,u. 


K„ (x) 


\/4^r(z^ + 1/2) 

fH 



(1 + -^) 


G+l/2) 


duj, 


Vx > 0 . 


We consider three different scenarios on z/, 0.5,1, and 1.5. We also set tro = 1 and po = 1/2. As customary 
in the literature, we assume that v is known and so v will not be estimated. For conducting the plug-in 
GLRT procedure, both parameters (ctoiPo) are estimated using full MLE. Due to the low dimensionality of 
unknown parameters, the most effective way to estimate (ctoiPo) is to apply brute force grid search over a 
pre-specified set V. Here, we choose V = {0.2, 0.4, • • • , 2} x (1/4,1/3.9,..., 1/0.1}. The final results of this 
numerical study is exhibited in Figured] We observe the following: 

• GLRT and plug-in GLRT have a significantly better detection performance than GUSUM. This per¬ 
formance improvement is more pronounced for smoother covariance function (larger v). In particular, 
the GUSUM is completely impractical for detection of an small change when v = \ or 1.5. 

• In each panel of Figure dl the GLRT has a slightly larger AUG than that of plug-in GLRT. Thus, 
Figured) verifies the existence of a small gap between the smallest detectable jump of GLRT and plug¬ 
in GLRT. Note that this fact can also be observed by comparing (13.51) and (14.11) . In short, having full 
knowledge of covariance parameters slightly improves the detection performance and so our proposed 
algorithm is robust to the estimation error of the unknown parameters of K. 

• Gomparing the range of b in each panel of Figure [1] discloses that more rapid decay of the spectral 
density can decrease the smallest detectable jump. This observation substantiates the role of v in the 
theory established in Sections |3| and |4| 


Next, we compare the performance of the GLRT with known parameters and the GUSUM in the increasing 
domain setting. Recalling from Theorems 13.31 and 15.21 these two methods have analogous asymptotic rates. 
In the left panel of Figure [5] we choose an exponentially decaying covariance function 

cov {Xi,Xi) = = cTp exp ’ 1,1 = 1 ,... ,n, 

in which ctq = 1 and po = 2. That is has exponentially decaying off-diagonal entries. However, in the 
right panel, the chosen covariance function has a polynomially decaying tail given by 


COV (Xi, Xj) — f\i-j\ — CTq 



— (H-A) 


with (To = Ij Po = 2 and A = 0.5. In this case, has heavier off-diagonal terms. Note that Assumption 
13.21 is satisfied in either of the two cases. It is evident from Figure [D that the GLRT exhibits a slightly 
better performance than the GUSUM, and the gap between the two AUG curves is more visible in the case 
of polynomially decaying covariance function. Thus, we still recommend the use of GLRT in the presence of 
strong correlation among samples in applications described by the increasing domain regime. 
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Figure 1: The above figures assess the performance of different detection algorithms when G is one dimen¬ 
sional Matern Gaussian process, with parameters {iy,ao, po), and regularly sampled in [0,1]. From left to 
right then from top to bottom, {i/, ao, po) = (0.5,1, 0.5), (1,1, 0.5), (1.5,1,0.5). In each panel horizontal axis 
displays jump value b and the three curves (dashed black, solid blue and green) respectively exhibit AUC of 
GLRT with known covariance structure, plug in GLRT (PGLRT) using full MLE and CUSUM. 
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Figure 2: The above figure assesses the performance of increasing domain detection algorithms. In each panel 
the horizontal axis displays the jump value b and the two curves (dashed black and solid blue) respectively 
exhibit the AUC of the GLRT with known covariance structure and CUSUM. In the right panel, we choose 
COY {Xi,Xi) = (Tq (1 + |i — 1|in which (ao,po) = (1,2) and A = 0.5. For the left panel, the 
covariance function is given by cov (Xi, Xi) = ctq exp {—\i — l \ / po ) where (cto, Po) = (1,2). 


8 Discussion 

As indicated in the Introduction, the comprehensive analysis of the detection of shift-in-mean of a Gaussian 
process in the fixed domain regime has remained relatively unexplored. However, the considered model in 
(HI) is only one of several plausible scenarios which should be subject to more thorough investigation. We 
note that the probabilistic model of G can be extended in some possible ways for future research. 

(a) Here we deal with a single abrupt change in EG. However, we believe that our techniques can be 
extended to rigorously formulate the minimax optimal rate of the GLRT and plug-in GLRT for detecting 
multiple shifts in EG. 

(b) Recently, Ivanoff et al. |30| studied the problem of change-set detection in two dimensional Poisson 
processes. Specifically, G is a Poisson process in and there are scalars po Pi and H C such 
that the intensity of G can be formulated by 

EG (s) = polsGn + PiIs^n, G (8-1) 

The objective is to detect O as well as possible based on observation of G. However, a comprehen¬ 
sive study of minimax optimal change-set detection methods for multi-dimensional Gaussian processes 
remains unavailable. Note that the fixed domain setting is the natural way to study asymptotic be¬ 
haviour of algorithms regarding spatial processes. So, this paper can provide valuable intuition about 
the minimax rate of change-set detection in Gaussian spatial processes. We expect that aside from the 
sample size and smoothness of the covariance functions, the geometric properties of change-sets will 
have a crucial role in the design and analysis of detection algorithms. 

Appendices 

Appendix s contains the proofs of the main results in Sections [2][6l Appendix [B] states and proves the 
technical results required in the proofs of the main results. In addition. Appendix [C] establishes properties 
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of the inverses of large Toeplitz matrices. Such results are useful for the proofs of Theorems [231 [S2] and 
and may also be of independent interest. 


A Proofs 

A.l Proofs for Section [2] 

Proof of Provosition \2.1[ In the following £ stands for the generalized negative log-likelihood ratio. 


2£ = X ' (S„) ^ X — min min 
^ ^ tGC„,c b/o 


x-^Ct] 


(A.l) 


Note that the objective function in (lA.ll) is quadratic in terms of b. The explicit form of 2£ can be obtained 
with a bit of algebraic derivations. The algebra has been skipped to save space; we arrive at 


2 £ = max max | — ^ ^ - —b^ + bfj (Sn) ^ ^ = max 


tec„,c b^o 


tGC„ 


C*' 




So, there is a threshold value, Rn,s > 0, for which the GLRT is given by (|2.4I) . 


□ 


The following result expressing the form of GLRT in the generic case of unknown p can be proved in an 
analogous way as Proposition 12.11 


Proposition A.l. There is R^ s > 0 for which the GLRT is given by 

^ Rn,S I ; 


Tglrt = I max 

t&Cn.a 


{Y,Ct-B, (t) 1 „) 




(A.2) 


where Y = (S„) ^ X and 


^2 (t) = C7 (Sn)-' Cb - 

In 


{cl (s. 


In (^ n ) Ir 


A.2 Proofs for Section [3] 

Proof of Theorem \3.1[ Let p = \v + 1/2], P = {!,...,p} and = exp(—1/n). Gonstruct a banded 
triangular matrix An G by the following procedure. 

An[k,k-j] = {-OnY , j G {0, ...,p}, kG{p+l,...,n}, 

pp — ^ Ip- 


It is relatively simple to verify that An is invertible. In addition, for brevity let Zt = ^ for 

any t G Cn,on in which Q has been defined in (12.21) . Lastly, define Un,t ■= AnCt G M", W := AnX and 
Dn ■= cov (w). 

is a set of standard Gaussian random 


Easy calculations show that under the null hypothesis {Zt\ 


t€Cn 


variables and so, by Lemma rB.il we have P (maxtgc„ „ Zf > Rn,s) < 5/2. That is, the false alarm probability 
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are 


is less than 5/2. Moreover if the alternative hypothesis i (for some i € Cn,a) holds then 
non-central Xi random variables and the non-centrality parameter of Z? is given by 

Applying Lemma lB.il (an = dfe = 1 for any k) demonstrates that ipn (T 2 ) < 5, whenever 


1^1 \/Cj i^n) \i > \b\ ^min 



Ct>8 



(A.3) 


Thus, in order to get a sufficient condition on detectable b, it suffices to find a tight uniform lower bound on 
C7 Ct for t e Cn,a- 

The identity {Dn)~^ An can be shown using the linearity of covariance operator and non¬ 

singularity of A. Choose t G Cn,a in an arbitrary way. As a result of this alternative representation of 
we have Cj (En) ^ Ct = (^n) ^ Un,t- Applying Kantorovich inequality (cf. Appendix B) and the 

triangle inequality yields 


C7 (S„)-' Ct = Ul, [Dn)-^ Un,t > 


Uj t^riUn,t 


> 


\\Un,t\\l 


Dn 


(A.4) 


Now, we show that > 5 , for large enough n. Indeed, after some algebra, we can get 


t+p 


\\Un,t\\l > E Kt{k)=Y. 


- (1 - OnT + 2 E 

P 

= 2E 


j=o 


(“) 

S 2E 


k-1 






P r 


/c=l 




= 2 


2(p-l) 

p- 1 


> 2P, (A.5) 


where inequality (a) follows from the fact that for large enough n, 0n is arbitrarily close to 1. To get an 
upper bound on 


p 


t+p 


wunAi, = Eif^".‘(fc)i+ E \^n,tm+ E + E 

k—p-\-l k—t-\-p-\-l k—t-\-l 


k^l 

P 


= E^ 


-2iy 


k^l 


E a-0nr+ E (i-^.r+E 

k—p-\-l k—t-\-p-\-l 


k^l 


k-1 


_(i_0„)p+2Er 

j=0 


< pn-^'^ + A-P + 2^2 




k-1 


3^0 


E A <-+)' 


< 2 + 2 E 

/c=l 


k-1 


E (/](-+ 


< 2 + 4E 




k-1 


3^0 


E (;j (-1)^ 


= 2 + 4E 


k=l 


p-1 

k-1 


(- 1 ) 


i=0 

fe -1 ^2 + 2^^+^ < 3 2P. 


(A. 6 ) 


Note that inequality (b) is valid when pn~^''+A ~p < 2, which obviously holds for sufficiently large n = O (1). 
The remaining inequalities and identities in (IA. 6 I) can be easily verified via basic properties of the binomial 
coefficients. Combining (IA.5|) and (IA. 6 |) yields the desired goal. Now, inequality (|A.4I) can be rewritten as 


C7 (Sn)-' Ct > 


Q\\Dn\U 


9 max var(H^fc) 

l<k<n 


(A.7) 
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In the final phase of the proof, we achieve a tight upper bound on maxi<fe<„ var (Vbfc). It is obvious from 
the formulation of An and the statioiiarity of X — EX that maxi<fc<„ var {Wk) = n~^'' V var iWp+i). So, 
the goal is reduced to give an upper bound on the variance of bbp+i. 


var (Wp+i) 


var 




\r—0 




i: 

r =0 


exp 


27T 

k 

-(l+jw) 


p 


r—0 
2 

duj = 




-jruj 


dui 




1 — e' 




2p 


duj 




K {to) [l + 6 *^ — 29n cos (a;/n)] ^ duo < 


P d,. f [1 + - 26>„ COS (^)] 


27^7 (l+w2) 


.+ 1/2 (A. 8 ) 


where, identity (c) is implied by Bochner theorem (cf. |53| . Chapter 2) and (d) is immediate consequence of 
Assumption 13.II Notice that 


1 + 0 ^ - 26'„ cos < (1 - + 26»„ (^1 - cos ^ < ^ + 2 (^1 - cos ^ 

Let ^ = p— (i^ + l/2) < 1. Henceforth, for any i? > 0, 



27rn2^ 

Ck 


var(Wp_^^) 


{;^+ [^sinc(^)]"}'’ ^ r {l/n2+ [wsinc(l)]"}'’ 


< 


(e) 

< 


(/) 

< 


-R 

R 


J (l+w2)"+l/2 

|l/n^ + [wsinc (f )]^| 

(l/n2+w2)"+i/2 


-dw = 


du! 


(l/n2+c,j2)-+l/2 

|l/n^ + [wsinc 


doj 


(l/n^ + oj^)^ doj + 


J (l/n2+w2)-+i/2 
hl>fl 

|l/n^ + [wsinc (f )]^| 


duj 


-R 

R 


|Ld|>_R 


(l/n2+w2)-+i/2 


-duj 


J (^1 /+ oj'^Y duj + 5P J |a;| dw < 3i?^ + -. 


-R 


|w|>_R 


(A.9) 


Inequality (e) follows form the fact that sup,^gR |sinc (a;/2)| < 1. In order to justify (/), observe that 
Iwsinc (w/2)| < 2 for any w G K. Thus, for large enough n and |w| > R, we get 


|l/n 2 + [wsinc (f)]^| 


(l/n 2 +c,j 2 ) 


^+ 1/2 


< Iw 


|-(2.+l) 


(l/n2 + 4)^< 5P Iw 


-( 2 .+ 1 ) 


Note that there is some uq := no{R,i') such that sup^g^ (l/n^ + w2^^ < 3/2i?^ for all n > no- This 
immediately entails inequality (g). 

Finally, minimizing the obtained upper bound in (IA.9P over i? > 0, we get 


var (bFp+i) < CCifU 



(A.IO) 


for some universal constant C > 0. Thus, there is another strictly positive universal constant, C", for which 
maxi<fe<„ var (IFfe) = V var(lFp+i) < C'CKn~^'' (l + i). So, (jA.7l) implies that 


c7 (s„)-' Ct > 


CK{i+iy 


(A.11) 


23 



























The combination of (IA.3I) and (lA.llll completes our proof. □ 

Proof of Theorem \3.^ The proof proceeds in a similar manner as that of the preceding theorem, in the sense 
that it is required to show that inequality (IA.3I) holds. Let 6n = exp . A„ represents the inverse of the 

Cholesky factorization of I]„. For any k < j and q £ [0,1], G (fc, j; q) denotes the following rational function. 

G{k,j;q)= (l “ 9^ 

e=j-k+i 




1=1 


and G'(fc,j;l) = G{k,j',q) is usually referred to Gaussian binomial coefficients in the combinatorics 

literature. Finally, let Un,t ■= ^nCt- Similar to (IA.3|) . the aim is to obtain a universal lower bound on 
C7 Ct for t G Cn,a- Observe that, (J Ct = \\Un,t\\%- 

In order to achieve a tight lower bound on ||{7„^t||^ , it is pivotal to study the non-asymptotic behaviour 
of the entries of A„. According to Proposition 1 of [t^, the entries of An are given by 


i^n)jk — ( 


= -\/0, 


0-0 G(fc-l,j-l;gn) . 

/n ( 1 -c) 

V e=i 


4j>0' 


Since ^ tends to 0 as n gets large for any £ G {0,..., n} and lim — = 1, we get 


n(i-c) 

.i=i 


' 0-1 

n(i 


- exp - 


ip^ 


1 


(j - 1)! \P 




(A.12) 


Direct calculations show that G {k — l,j — H-i) in a small neighborhood of 1 and 

j,k G {!,..., n}. Thus, 


-\ e. 


\U-k) (3 1 


io-k) 

The asymptotic identities (IA.12I) and (IA.13I) come in handy to analyze ||C/„|| 


fc- 1 
2 


(A.13) 


WUnWl > E E 

3=t+l 0=t+l 


E (M-fc-E(^") 


jk 






E 

j=t+i 


1 


U - 1)! \P 


20 - 1 ) 


E (-1) 


= E — 




U - 1 )! \p) 


20 -1) 


j 

E(-i) 


U-k) (j 1 
fc- 1 


-E(-i) 


io-k) (j 1 


k=l 

t 


U-k) (j 1 
fc- 1 




U-k) (j 1 
fc- 1 


" 1 

E 77^ 


o=t+i 


U - 1)! \P 


20 - 1 ) r 


0-2(-iy 


j - 1 

t 


( i — 




(j - 1)! \P 


Thus, there are universal constants G, G' > 0 and Gp depending on a and p such that 


" to-i\^ / \ ' 

^ - E H (?) 


j=t+i 


> G 


ivr 

(n — 1 )! 


(a) 

> G‘ 


/ny J_ / en^ 

\t J yjn \np^ 


(b) 

> {GonT . 
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Note that inequality (a) can be shown using Stirling’s formula and (5) is obvious implication of the fact that 
t < {1 — a)n (Recall Cn,a from Section 12.31) . In summary, we have that 


|6| v/a(Sn)-'Cn>l&l (Con^/^ 

We conclude the proof by appealing to Lemma IB.ll □ 

Proof of Theorem \3.tA The proof is also similar to that of Theorem 13.11 By applying Lemma IB.ll we need 
to show (IA.3|) holds. That is, 

|6| ^min \J Ci > 8y^log(—(A.14) 


So, the sufficient detectability condition will be obtained by finding a tight uniform lower bound on Cj (S„) ^ Ct 
on Cn,a- For any t G Cn,ai define at,a( G {0,1}" by at (i) = l{i<t} and a[ = In — at- Observe that at,a( 
have non-overlapping support and ft = o-t — o,'f Thus, 


C7 (Sn) \t=aj (S„) ^ at + a't^ (E„) ^ a't - 2aJ (S„) ^ a( 


> aj (E„) ^ at - 2aJ (E„) ^ a't 


The last inequality leads to the following key result 


mm 

t&Cn 


C7 (s„)-'Ct 


> 


(b) 

> 


mm 


(s„) ^ at 


— 2 max 
tec„ „ 




\-i 


(“) . a 

> min — 

t^Cn.ac 


i^n) ^ at 




/(O) 


, (c) 

- e; - Cn > 


2 /( 0 )’ 


(A.15) 


in which and are appropriately chosen non-negative vanishing sequences, based on the 

developed results in Appendix [Cl In (IA.15I) . inequality (a) and the explicit form fn ^'re obtained from 
Lemma rC.il One can find a closed form of and verifies inequality (6) using Corollarv lC.il Furthermore, 
(c) holds whenever n is greater than some no, which depends on c and A. The proof of Theorem 13.31 is 
completed by combining (I A. 141) and (IA.15I) . □ 


A.3 Proofs for Section |4] 

Proof of Theorem\4- 1\ Recall dK,oo from (??) and define the event Am by Am ■= [dK,oo (?7m,0^) < p„ 


Moreover, we use Zt to denote 


C7(Sn)-^X 


Notice that rim, and are measurable functions on the 


sample space. Furthermore for any u G Am, Vm (u) and (u) represent the value of these measurable 
functions at u. Lastly, for u G Am define the random variable Zt (u) by 


Zt{u) 



(u)^ 

-1 

X 


(it) 

)■/■ 


As the range of ipn is [0, 2] (See Definition 13.ip . we have by Assumption ?? that 

Tn (TglRT^ < STto + E (^(fin {TgLRT^ \ Arr^ . (A. 16) 

As lim sup^_,,gc Pm < 1, there are 7 G (0,1) and mg G N such that pm < 7 for any m > ttiq. Choose 
7 < 7o < 1 in an arbitrary fashion. We aim to obtain a sufficient condition on b to control the second 
term in the right hand side of (IA.16I) below S. Notice that throughout the proof we assume that m > mo- 
Conditioning on the occurrence of Am, the following statement trivially holds for m. 
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3 Tj'ui ^ S.t. 


■ j ) 


< — < 1 . 
7o 


(A.17) 


Now choose u S Am arbitrarily. Define the covariance matrix of observations associated to r]m (u) by 
E„(u) = Similar to the proof of Theorem 13.11 it is necessary to study the two 

following quantities: 1. variance of Z (u) and 2. expected value of Z (u) under the alternative hypothesis, 
to control the false alarm and miss detection probabilities. Notice that 


at (u) := 


(b) 

< 


var Z(u) = 


(s„ (u)) ' S„ (s„ (u)) ' (t (a) C7 (s„ (u)) ' (u) (u)) ' Ct 

- < 5 - 


cr (s„ (u)) ' C 


Ct^(E„(u)) 'c. 


B 1- 


7o 


-1 


(A.18) 


Lemma [6.41 ensures the existence of some scalar B € (l,oo) for which inequality (a) in (IA.18I) holds (since rj 
and fjm {u) belong to the same equivalence class). Moreover, (6) can be easily deduced from the combination 

of (IA.17|) and the second inequality in Lemma IB. 21 Namely, at {u) < ao := B (l - . Now, using 

Lemma [B. II we get 


P ( max Z'^ > Rn,m \ Am 1 < I- 

,l<t<n ' j 2 


(A.19) 


. So, we have controlled 


Note that Lemmasuggests to take Rn,s,p^ = ao 1 + 2 ^log (^) + y^log (^)'j 
type 1 error from above in (IA.19I) . Now we turn to control the type II error from above. Assume that there 
is a sudden change in the mean of G at t S Cn,a- According to Lemma m type II error is less than S/2 
whenever for any u G Am 


\j Cj (Sn (m)) It > dcToy^log 


4B 


Applying Lemma IB. 21 and then Lemma IB. 41 one can easily show that 


'log 


c7(s„(u)) 'c 


f > 1 - 


^ C7 (Sn {u)) ' Ct > (Sn (u)) ' Ct- 


7o 


(A.20) 


(A.21) 


The combination of last inequality and (IA.20I) along one line of algebra leads to the following sufficient 
condition to control type II error holds if 


|6| min \/ C7 (S„) ^ Ct > 8 I - 


B 


3/2 


Pm 

70 


'log(¥ 


We conclude the proof by invoking (jA.llll to obtain (jA.22p . 

A.4 Proofs for Section [5] 

Proof of Theorem \5.1[ Choose t G G„ „ and define 


(A.22) 

□ 


[/; := 


tfn — t)/ 1 


n — t 


E ■ 
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Moreover set 


Rn.S — 




n 1 + 2 log 


2n (1 - 2a) 


+ 2Wlog 


2n(l - 2a) 


Note that under the null hypothesis, is a zero mean random variable and 


lim var (Ut 

n—^oc> 


(“) 

= lim 


tin — t) f K (uj) 


n—^oo Ji^ 


27r 


n — t 


exp{-jkuj/n) - - exp (-jfca;/r 






lim 

n—^co 


kiiv) 


27r 




exp{—jku!/n) 1 — Pexp jkw/n) 


(a) 


Kju;) 

27T 


1-/3 




e-^‘^'^du - 


P 


E 

fc=i 


du! 


dbJ 


' 1 - 


P 


e-^^'^du 


dio = 


OO 

/ 


k{w)Gp M 


27r 




where 
G;3 (w) := 


(1 - p) sine ( 


n 2 


P sine 


(l-/3)< 


n 2 


+4/3 (1 — P) sine ( ^ ) sine 




sm 


(i)^ 

(A.23) 

The identity (a) is implied by Bochner Theorem and (&) follows from the dominated convergence theorem. 
It is easy to see that HGflll < 1 and so lim var([/*) < 1 by the triangle inequality. Moreover, Lemma 

IB. 81 shows that the achieved upper bound on cr^ = var (G*) is tight up to some constant whenever K has a 
uniformly bounded derivative. Namely, there is a universal constant c S (0,1) such that c < lim var (t/*) < 1 

n—^oc 

?2 


for any /3 S (0,1). Let R!^ g = Thus 

P(T = 1 I Ho) = P max |17t| > Rn,s \ Hq ) = P ( max |C/(*|^ > R* g \ Hq ) . 

\i^Cn,a J J 


(A.24) 


For any t G C„_q,, \U^\^ is a (non-normalized) Xi random variable, as cr^ < 1. Moreover |Cn,_o,| = n (1 — 2a). 
So the part (a) of Lemma IB. II says that 


^max^|t/*| >Rls 




Now we turn to control the miss detection probability. Without loss of generality assume that 6 > 0. Choose 
an arbitrary t G Cn^a- A line of algebra shows that 


E {U* I Hi_t) > 6v'a(l - a). 


(A.25) 


Eq. on b implies that E \ IHIi_t) > d-^log (2n (1 — 2a) /S). In other words, given a sudden jump at 
t, |C/*| , s G Cn,a are non-central Xi random variables satisfying the conditions of the part (b) of Lemma 
IB. 11 Hence 

P(T = 0 I Hi,t) = P( max \U*f < i?* J Hw I < 1 

\SGCn,Q ’ 


(A.26) 


□ 


Proof of Theorem \5.‘A We continue to use the same notation as the proof of Theorem 15.11 Note that there 
are three appropriately chosen vanishing sequences {“"IneN 
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var (Ut) = 


t 


(a) 

< 


n(n — t) 
t 


var I Xk 


n 


{n-t) 


var 


E 




n — t 
nt 

n — t 
nt 


\k^l / 
f t \ 

E^^ 


var 1 7 
\k^i 
/ t 

var 

\fc=i 


-cov 


\k—l k—t-\-l 


— —f (0) H-/ (0) + — / (0) + al 

n n 


(A.27) 


in which inequality (a) follows from Lemma fC.2l and (b) is implied by identity (2.1.3) in [62]. Thus, there 
is no S N (depending on / and i9) such that for any n > no, maxtgc„ „ var (Ut) < (1 +19) f (0). The rest of 
proof will be omitted because of the analogy to (IA.24ll - (|A.26p in the proof of Theorem 15.II □ 


A.5 Proofs for Section [6] 

Proof of Theorem 16.11 We follow the standard method for bounding the Bayes risk from below. Observe 
that 


inf i^n{T) = 
> 


1 - sup inf [P (T = 0 I Ho) - P (T = 0 I Hi,*)] 

T 

(o) 

1- inf sup|P(T = 0 I Ho)-P(T = 0 I Hi,t)| > 1 - inf iJ(Po,Pi,t), 

*^^ 71 , 0 ; X ’ t^Cn,OL 


where (a) follows from inequality 2.27 in [55]. So, it suffices to show that inftgc„ „ (^o, Pi.t) < (1 “ <5)^- A 

few lines of straightforward algebra on the explicit form of Bellinger distance of Gaussian measures indicates 
that infr tpn (T) > 6, whenever 


.i"!. ^ (usb)) ■ 

Henceforth, it is enough to obtain a tight upper bound on inftgc„ „ Ct (^n) ^ Ct- 

Let (7 = 1 and choose d > 0 by ' Furthermore, let Fd^p^a : R H> R denote the 

Matern spectral density parametrized by p and d and a as (13.21) . Note that d is well defined due to the 
first condition in Assumption 16.11 Define G R" by (k) = lk>t and let Ct = Ct ~ Ct for any t € Cn,a- 
Moreover, let = exp (—din) and S'* = |t + 1,..., n|. Finally, define the covariance matrix 'i>n G R"^" by 
= [Fd,p,a ((r - s) /n)]"^^^. Observe that 

c7 (S„)-' Ct = 2 (cJ (S„)-' Ct + C7 (S„)-' (S„)-' 

< 4 (E„)-' Ct V C'j (S„)-' • (A.29) 

We aim to prove that there is a constant C := C (p) > 0 for which Cj (D„)~^ Ct < Cn^^~^. The same upper 
bound can be obtained for (D„) Ct fo an analogous manner. 

We first show that 

( - 1 1 G (R) . (A.30) 

\Fd,p,a ) 

Let M represent the finite limsup in the second condition of Assumption 16.II Without loss of generality, we 
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can assume that /3 < 2 in Assumption 16.II Using a few lines of algebra along with this condition, we get 


lim sup 


Kjuj) 
Fd,p (w) 


-1 


= lim sup 


+ lim sup 






K 


UJ 


k (a;) 




'2 \ P 
2 

j2 \ P 


< lim sup 


p (k(u^ 


C' 


- 1 


K 


1 + 


-1 


(“) p |2p-2+/3 (f>) 

= M+ hmsup A (a;) |a;| = M. 

Notice that, identity (a) follows from Assumption 16.11 and first order Taylor expansion of (l + z)^ for in¬ 
finitesimal a; > 0. Moreover, (6) follows from the combination of /3 < 2 and the first condition in Assumption 
16.11 Namely, there is i? > 0 such that 


k{uj) 


Fd,p (^) 


-1 


2M , , 
- ^ 


which substantiates (jA.30p as /3 > 1/2. 

It is known (4.31, Chapter III, [55]) that there is a function ^ G (K) with bounded support such 
2 

that Fd^p^a (w) X 4>{^) as |a;| —>■ oo. Theorem 4 of Skorokhod [51] implies that the associated zero mean 

Gaussian measures to spectral densities K and Fd^p^a are equivalent. Based upon Lemma IB. 41 there exists 
a constant G (0, oo) such that 


1 

— < 




lim 


n^oo 1 

So, it suffices to show that kt < C'ikP~^ for some appropriately chosen C" > 0 depending on Sj 
and C. 

Letting ^ = p — 1/2 and recalling A„, W and form the proof of Theorem 13.II we have 




a («'n)-'6 = (4ln6)' D-^An^t) < 


T T^—l , 


W WAn^tWl 


(A.31) 


Amin {Dn{St,St))' 

Note that inequality (6) is inferred from supp (A^^t) = St- Applying a similar technique as (IA.5I) . we get 


Un^tWl = (r.-i-p)(l-0n)" + E + 

k^l J k^l \j=0 

- 2 ^ + 2 (2e)'’”^ < (2e)^ . 


(A.32) 


So, (4'„)-'6 < (2e)"'’ [Amin(^n(^t,^t))]”'- 

Next, we control the smallest eigenvalue of Zl„ {St, St) from the below. We first control the diagonal 
entries from below. Note that all the diagonal entries of I?„ {St, St) are the same and given by (cf. (lA.Sp i 


Q = 


Fd,p {to) 


2tt 


[l + 9^ — 29n cos {oj/n)Y dojoi J 


(c) f {d ^2 _ 2Q^ cos {uj/n)Yduj 


= n 


-2u 


(1 - 9n)^ + 40„ sin^ {duj/2) 
\/n? 


(d) „-2!^ f 

duj > —-— / [sinc((iw/2)] ^ duj = Cdn~'^'', (A.33) 
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where (c) is obtained from (13.21) and the inequality (d) follows from the fact that for any 7 S (0,1) (here we 
put 7 = 2 “p), there is uq ( 7 ) such that for any n > uq ( 7 ), 


(1 — 0n)^ + 40„ sin^ {duj/2) 
\/n? + 


> 7 ^^ [sine (da;/ 2 )]^ . 


The proof of the last inequality will be skipped due to its simplicity. 

Now, let S := (5't, S-t) jQ. The combination of (IA.31II . (IA.32|) and (IA.33I) shows that 




-1 * . Chn 


Amin (^) 






CqU 


2p-l 


Amin (^) Amin (‘^) 


for some constants, Cq (p) and Cq depending on Cq, and K. It can be shown using identity 1.2 of [ 8 ] 
that there is some integrable function g : [—tt, tt] i—^ M with rrig := essinf (g) > 0 such that S is a p—banded 
correlation matrix, i.e. S (r, s) = 0 for jr — sj > p, and ^ = Tn if)- In remains to note that Lemma 6 of |22| 
implies that Amin ( 2 ) > rrig for any n, which concludes the proof. □ 

Proof of Theorem \6.S[ The proof is similar to the proof of Theorem in with some minor difference in the 
detail. Applying the classical technique of bounding (T) from below in terms of Hellinger distance, we 
need to verify (IA.28|) . Recalling the formulation of and from the proof of Theorem 16.11 we get the 
following inequality for any t G 


(Sn)-kt < 




(En)-'6ve; 


(Sn) 


1 

■'cn < 


4n ( 5 ) 4n (1 + i?) /» 


Notice that (a) is exactly the same as inequality (IA.29|) . Moreover, (b) follows from Corollary I C. 1 1 and lastly, 
there is no € N, which depends on / and i9, for which inequality (c) holds. Using (|A.34I) . one can easily 
verify inequality (jA.28l) and concluding the proof. □ 


B Auxiliary results 


This section contains several technical results needed in Appendix 

Lemma B.l. Let cto > 1 and n> 2. Let 3 S K" be a Gaussian random vector with E3 = p and var3fc < ctq 
for any 1 < fc < n. Moreover, let = 1 + 2 ^log (^) + y'^log ■ For any S G (0,1) and any n G N, the 

following results hold. 


1. li fi = 0, then P max 3? > crnRn 

l<j<n 

2 . If > Aao\J^og (^), then 


< ^ 
— 2 ■ 


max 3 ? < CTnRn 

l< 3 <n ■' 


< i 
— 2 ■ 


Proof. For brevity, let aj = j = 1,... ,n. Notice that are standard Xi random variables, for 

any j = 1,... ,n. Lemma 8.1 in [7] implies that P (3| > o-'^Rn) < Thus, P (3| > cr^Rn) < ^ due to 
o’j < cTo- We conclude the proof of the first part by a union bound argument. Now, we turn to prove the 
second part. Define k := arg max \g,j\. It is easy to verify that < 4 log (^). Observe that 


max 

l<j<n 


3? < fToRn 




- < 
2 — 
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. The lower 


Moreover, is a non-central Xi random variables with non-centrality parameter Bj. := 

'’’fc _ 

bound condition on |/ifc| implies that Bk > 4^y^log (^). We finish the proof by the following inequality, 


.2 <4 


log 


(a) 

< 


^2 


<l + Bt-2. {l + 2Bl)\og 


(b) S 
< 

- 2 


In order to demonstrate inequality (a), we need to show that l+B‘^ — 2 ^(1 -|- 2B^) log (|) > 4 log (^) 

which can be shown by obvious inequality ctq /cfc > 1 and a few lines of algebra. Inequality (b) can be inferred 
from Lemma 8.1 of m- □ 

Proposition B.l {Kantorovich inequality, (p. 452, [H])). Let E G R."^" be a non-singular covariance 
matrix and let V G R" be a non-zero vector. Then, V^'S~^V > yj^-^y ■ 

Lemma B.2. Let Kq and Ki be two covariance function with spectral densities Ko and ILi, respectively. 
Define, Eg := \Kq and Ei := \Ki Suppose that there exists p G (0,1) such that 

- 1 <p. Then, 

OO 

(a) Eo-Ei^ ^Ei. 

(b) Er'(Eo-Ei)Er'^T^Er^ 


Proof. Trivial calculations on 


El _ 1 
Ko ^ 


< p shows that for any w G 


^ -Ki (w) < Kq (w) — Ki (uj) < —^—Ki (lu) . 


1+p 1-p 

Choose V G M" arbitrarily. The basic properties of spectral density and inequality (IB.II) imply that 


(B.l) 


27ru^ (Eg — El) u = / ( Kq (uj) — Ki (cu 


n 

2 


n 

e=i 

duJ < — -r / 

R 

Ki (uj) 



doj 


2ttp -t 


1 - p 


V Eiu. 


Thus, Eg — El ^ y^Ei. The second inequality is an obvious implication of the first inequality. 
Lemma B.3. Let 6 G (0, 2), d G (0, oo) and define if : R i—!> K by /f (r) = cr^ exp Then, 

cr^^r {6) sin (^) 


□ 


lim K (uj) = Cs (d,cr) := 

CJ—>-oo 


nd^ 


Proof. It is obvious that Cs (d, a) = a'^Cs (d, 1), so without loss of generality assume that tr = 1. It is trivial 


that K (r) is of index <5 as |r| 0, i.e. lim . J = A° for any A > 0. Thus, applying the Tauberian 

|r|—lO 

Theorem (p. 35, [HS]) leads to 


lim [1 —if(l/a;)] ^ / K (u) du = 


r(5)sin(f) _ Cs{dA)d^ 


(B.2) 
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Moreover, the first order Taylor expansion of e ^ is at 0, implies that \1 — K (1/w)] ^ {duj) 1 as w —>■ 0. 
Thus, (IB.2I) can be rewritten by last limiting identity and applying L’Hospital’s rule. 


Cs {d, 1) = lim Sd ^ {duj)° 5uj^ I K (u) du = lim I K (tt) du 


lim K (w) |tLi| 

Cd —>-00 


1+5 


□ 


The following Lemma is probably well-known in the literature of Gaussian processes (e.g. the identity 2 
of [53] (p.ll2) is analogous but not exactly same as the part (a) of Lemma FB.dll . Because of the absence of 
direct references, we include and prove the following result in this section. 


Lemma B.4. Let G^, i = 1, 2 be two zero mean stationary Gaussian process in [0,1] associated to covariance 
functions Ki, i = 1,2, respectively. For any n G N, define two positive definite covariance matrices by 
Y,n '■= \Ki (^^)] and '■= \K 2 (^^)] • If Gi and G 2 induce equivalent measures on the Hilbert space of 
([0,1]), then there exists an scalar B G [l,oo) for which 


V ' 


< B. 


2 . 


< lim inf„^o„ 

71^00 




< lim sup, 

n —700 




< B. 


Proof. We use Pi,i = 1,2 to denote the probability measures with respect to Gi, i = 1,2, respectively. 
Abusing the notation, X G K" represents the random vector generated by sampling Gaussian process at 
n S N. We prove the existence of a finite scalar Bi for which lim sup„^Q^ " 1 < Bi. 

Assume toward contradiction that lim sup„^Q tends to infinity. So, there is a sequence of non-zero 

n—kOO i- n V 

vectors {vn G such that 

limsup—— ^^—^= 00 . (B.3) 

n-s-oo vf 'i/nVn 


Gonsider the measurable event E„ 


|(ri„,X)| > y'T^nVn 


Simple calculations shows that 


Pi (E„) = Q (1), P 2 (E„) 



(B.4) 


in which Q (•) stands for the Q-function, i.e. Q (r) = f exp (—a:^/2) 1 (|a;| > r) dx. Gombining (IB.31) and 
(IB.41) leads to lim sup = 00 which contradicts the absolute continuity of Pi with respect to P 2 . One 

cam show using the same technique that there is B 2 G (l,c)o) such that 


— < lim inf 

B2 rt—>00 V^On 


V^'^nV 


We conclude the proof by choosing B = Hi V H 2 . Now, we turn to substantiate the second claim. Pick a 
non-zero vector v G R". According to Lemma IB. 51 there is an suitably chosen n—dimensional vector u (The 
inner product of u and v is necessarily 1) such that 






vJ'^nU „ 

-- < < B 

max^^ „)=i w ' E„a; it' 


Note that the inequality (a) is obtained from the first part of this Lemma. Taking supremum over all 
non-zero v G K" and n G N terminates the proof. □ 
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Lemma B.5. Let E G be a non-singular covariance matrix and let uj G R." be a non-zero vector. 

Then, 

= min (B.5) 

(v,(*;) = ! 

Proof. Since the optimization problem in (IB.5I) is a convex program with continuously differentiable objective 
function and constraint, so its minimal value can be obtained solving the KKT equations. That is, there are 
A > 0 and v such that 

2E'0 — Aw = 0, A ((h, w) — 1) = 0. 

Solving the above set of equations yields, v = . The desired result will be established by replacing v 

into the right hand side of (IB.5|) . □ 

Lemma B.6. Let SI = {77 = (d, cr) : 77 G 12} C (0, 00 )^ be a compact set such that 

dist (f 2 , {(cc, y) : a; = 0 or y = 0 }) > 0 . (B. 6 ) 

The conditions of Proposition ?? are satisfied for the following scenarios. 

1. K (-, 77 ) is the powered exponential covariance function with known /3 G (0,2) and rj G fl. 

2. K (-, 77 ) is the Matern spectral density with known i/, given by (13.21) . 

Proof. We first substantiate part 1. It is easy to verify the first condition in Proposition ?? for powered 
exponential covariance. For proving the continuity condition, let rjm be a convergent sequence in S2 to 77 . So 


K (•, IJm) 


K (-,77) 


< 

00 


\K {r,rim) — K {r,ri)\dr ^ 0 , as 777 —>• 00 


due to the dominated convergence Theorem. Finally, Lemma IB .31 and a few lines of algebra imply that 

sup||VlogG:x,n(77)||^ =2 sup J cr-^ -p {l 3 /df <00. 

ri^n {t7,d)en 

Notice that the last inequality is a consequence of (IB. 61 ) . Now we turn to the proof of second part. The 
verification of the first two conditions in Proposition ?? is analogous to the powered exponential covariance 
function. Note that €k,q (77) = v^rh^-i-1/2) ^-2^/ Matern covariance. Thus, 


sup llVlogCK,^ (?7)||<. = 2 sup J(T-^ + {i^/df < 00. 
T)en {<T,d)en 


□ 

Lemma B.7. Suppose that K {uj,r]) satisfies the conditions in the part 2 of Lemma [B.6I and the taper 
function /tap (w) admits Assumption ??. Let ATtap be the convolution of K {uj,r]) and /tap- Then, 


is a continuous function of 77 in S 2 . 


giv) = 


Aitap (•) V) 
K (-,77) 


Proof. Let dmax = sup {d : {a, d) G S 2 } and dmin = inf {d : (a, d) G S 2 }. Note that 0 < dmin < dmax < 00 as 
S 2 is a compact set in the interior of the upper right half-plane, define h : R x fl 1—^ ( 0 ,00) by 


h{uj,r]) 


A^tap (^; y) 
k (w,y) 


1 / ; 7 ~,k{u,rj) 

1= / /tap (w - 7i) -T- - -du- 1 . 

J K{uj,r)) 
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We first show that for any fixed w G K., h (w, rj) is a continuous function of rj. Define two real valued function 

hd ui{,u) = (1 and p (u, rj) := /tap (w — u) Note that hd ui is integrable and for any m G R, 

p {u, ■) is a continuous function of rj. Moreover, 


sup p {u, p) = /tap (w - u) sup ( ^ 

r]=(cr,d)en lyGfi \ « + U 

Choose 77 G D and a convergent sequence {Tynj^i C D to 77, in an arbitrary fashion. The dominated 
convergence theorem shows that 


i^+l/2 


< sup 




2 \ ^+1/2 


d 2 


^ ^d,ijj (^) ■ 


lim h{uj,pn) 

n—too 


lim 

n—too 


00 



— 00 


h{uj,p), 


which confirms the continuity of h with respect to 77. Now, we demonstrate the continuity of g. Notice that 
g (77) = \\h (•, 77)112. One can show using the exact same technique as Lemma 4 of | 16 | (with a minor algebraic 
difference) that there is a function g : R 1 —>■ K given by q {ui) = ci V C 2 |w| where ci, C 2 and r are 

appropriately chosen finite positive scalars, uniformly dominating h {lo, 77) over Q. Namely, 

sup h {lo, 77 ) < g (w). 


Thus, 


lim g{pn) 

n—^oo 


giv) 


< lim \\h{-,p) 

n—^oo 


h{-,Vn)\\2 



[h^ {uj,p) - (w. 




terminating the proof. Notice that identity (a) follows from the continuity of h with respect to 77 and the 
dominated convergence theorem. □ 


Lemma B.8. Let iL be a covariance function such that 
Then, there is a universal constant c > 0 such that 



< 00 and define Gp [0,1] by (IA.23I) . 

00 


inf [ K {lo) Gr {lo) doo > c. 
/3G(0.1) J 


— 00 


Proof. Observe that for any w G M, Gp {lo) is a quadratic function of /3 in the compact interval [0,1] and 
lim \\Gp^ — Gp\\^ = 0 for any convergent sequence /3„ —>■ j3. This property implies that 


inf [ K {uj) Gr (w) dw > - 
/3G(0.1) J > - 2 


00 


00 


inf 

/3g( 0.1), |/3-l/2|>r 


K {lo) Gp {lo) (Lo A 


K (w) Go,5 {lo) dLo 


— 00 


— 00 


— 00 


(B.7) 


for some sufficiently small r > 0. Observe that, Gp (0) = (1 — 2/3)^ > 0 for /3 7 *^ 1/2. The differentiability of 
Gp and K {lo) implies the existence of a non-degenerate open interval Ip centered at 0 such that. 


inf K {lo) Gp {lo) > 


(1 - 2/3)^ K{Q) 
2 


7 k{uo)Gp {uj) 

J 27 r 


(1 - 2/3)^ k{0) 

Itt 


\M- 


—00 


34 



















Notice that inf|^_i/ 2 |>r (1 — 2^)^ \Xp\ > 0. So, we just need to show that the corresponding term to /3 = 1/2 
in the right hand side of (iBJl) is strictly positive. For f3 = 1/2, Gp (w) = [sine (a;/4) sin (a;/2)]^ and so 


k (w) G /3 (w) 


27r 


dw > 


iF (w) [sine (a;/4) sin (a;/2)]^ (^) 2 

- did ^ — 


2-77 


27r 


(c) 


k (uj) sin^ (w/ 2 ) du > 0 . 


-271- 


Note that (b) is a consequence of monotonicity of sinc(-) in the interval ( 0 , 7 r/ 2 ) and inequality (c) follows 
from the combination of k' (0) < oo and k fOl >0. □ 


K' (0) < oo and iF (0) > 0. 


C Non-asymptotic behaviour of the inverse of Toeplitz matrices 

In this section, we investigate some non-asymptotic properties of the inverse of Toeplitz matrices with 
polynomially decaying off-diagonal entries. The developed results plays a crucial role in the analysis of 
GLRT in increasing domain. We first introduced some simplifying notation. For any symmetric and periodic 
function / G (R) with period 2tt, define 


Tn (/) := 


( fo fl f 2 

fl fo fl 

/2 fl fo 

V : ■■■ ■■■ 






in which {fm}mez denotes the set of Fourier coefficients of / and f-m = fm for any m G Z. Moreover, let 
kn if) = [7n {f)]r,s=i- Finally, for any Sn, 5^ C {1,..., n} and any G define 


r := 




in which G R" represents the indicator vector of Sn in {l,...,n}. For the sake of brevity, we use 
the shorthand notation T{Ln,Sn) if Sn = S!^. Throughout this section we assume that there exists some 
a G (0,1/2) such that Cn,oL = [cxn^ (1 — a) n] and is a Toeplitz covariance matrix where its generator 
satisfies Assumption 13.21 for some c,X,mf and Mf. We also use to indicate the infinite sized version of 
Lri' 

Lemma C.l. Let = {1,. .., t} for some t G Cn,a and S^ = {1, • ■ •, \ Sn- There is a bounded constant 

G > 0 depending on / and a such that 


{L-\Sn,Sl)\ < 


c 


a) 


i -f l{A=i} logn) . 


Proof. Without loss of generality assume that t < n — t. The proof is based upon the well known fact that 
for a positive definite matrix with polynomially decaying off-diagonal entries, the corresponding elements of 
its inverse shrink with same rate (See e.g. |3I]). For any d G {1,..., n — 1}, let 

C {d, t) ■■= {(r, s) : r G Sn, j ^ Sn, |r - s| = d} . 

A simple counting argument leads to 


d, d G {1,..., t} 

|C(d,t)| = ■( t, dG{t,...,n-t} 

n — d, d £ {n — t + 1,... ,n — 1} 


(C.l) 
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Using triangle inequality and rearranging the different components in t Sn, we get 


\T{L^\Sn,S^)\<—^===J2\C(d,t)\ max |L„i(r,s)|. (C.2) 

(r,s)GC(d.t) ' 

According to Lemma A1 of [23], there is a C (/) for which the entries of L~^ admits the following inequality, 

Cif) 


Ln (us) < 


(1 + |r- s|) 


l + A- 


(C.3) 


We terminate the proof by substituting inequality (IC.3II and identity (1C.II) into inequality (1C.21) . We will 
skip the algebraic details due to lack of space. 


{L-\Sr.,Sl)\ < 


c 


E 


(a) 

< 


C 




. — d 


:=t+i (1 + d) 

-(AAI) + 1{A=1} logn) . 


d—n—t-\-l 


(1 + d) 


l+A 


^ya{l-a) 

Note that inequality (a) follows from the fact that ^yt{n — t) > n^J a (1 — a). □ 

The proof of the following result is omitted due to its analogy to the proof of Lemma 1C. II 

Lemma C.2. With the same conditions and notation as Lemma EH there is a non-negative constant 
C := C (/) such that 




C 


y/a{l-a) 


^ (aai) + l{A=i} logn) . 


Proposition C.l. Define the infinite column vector i/„ G by (r) = Then, there is a constant 

C '■= C (/, a) such that 


1^ T r -1 ^ 

N /(O) 


< C (ji I'*' *^1 1{ag(o,i]} + n ^l{A>i}^ 


in which e G (0, A) is chosen in an arbitrary way. 

Proof. We obtain an upper bound on — l//(0)) and the lower bound can be achieved using 

akin techniques. For brevity, let g = 1//. According to Proposition 1.12 of [9|, satisfies the following 
identity. 


=rn{g) + L^^Hnif)Hn{g), 
in which Hjq (/) is the generated Hankel matrix by / as 


(C.4) 


Hn (/) = 


( h /2 h ■\ 

/2 /s fi ■ ■ ■ 

fa fi fa ■ ■ ■ 

V ; i : ; 
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Hn (g) can also be defined in a similar way. A simple algebraic manipulation on identity 1C.41 leads to 




-^n 7n (5 ^- 

n n 


n 

(a) 1 


^ ^ n n 

-'4.%i.g)'^n + -'^^{L^^HnU)en{r) ■,HT^{g)eTi{k)) 


r=l s=l 

II r -1II 

T-7r_/„^,, I ll'^N ll2->.2 


< -v^Tn{g)vn + 
n 


n 


^ ||i?N ( 5 ) CN {r)\\i^ ^ II(/) CN (s)ll^ 


1 1 / X > ^ 

< ( 5 ) Vn + - ^ ||i?N {g) bn (r)||^^ V ^ \\H^ (/) (s) 

^ \r=l s=l 


(C.5) 


Note that (a) is direct consequence of the combination of generalized Cauchy-Schwartz inequality and simple 
properties of operator norm; The last inequality of 1C.51 can be obtained by inequality (1.14) of H] on 

n 

the operator norm of Toeplitz matrices. As next step, we control ^ ||iJN (5 ) bn (r)||^2 from above. It is 


known that if m/ > 0 and / satishes condition 13.91 then g does as well. So, there bounded are constants 
c (A), d (A), c" (A) > 0 such that 


Y,\\Hnig)enir)l^ = 

i=i 


^\gkf <cJ2^ 


-{2+2\)fj^j. < e' j 


r=l \ k=r r=l ^ 

< c" '''1 {ag(o,i/ 2)} + log+ l{A>i/2}) • 

Henceforth, there is some constant strictly positive c (depending on A) such that 


-(1/2+A) 


(C.6) 


^l'j7i^(g)!^n+c(n ^^l{AG(0.1/2)} +n ^ log^ ?T’l{A=l/2} + JT’ ^l{A>l/2})- (C.7) 


Lastly, we obtain an upper bound on 7 n (g) k'n- 7n ( 5 ) can be viewed as variance oi Sn = X) 


k=l 


where is a stationary process with spectral density g. Thus, identity 2.1.2 of [62] shows that 


-i^lTn{g)iyn = 
n 


irn 


sin (nw) 
sin (uj) 


g (2a;) duj = 


g(0) f / sin (nuj) 
TTn J \ sin (a;) 


dw 


2 f / sin (noj) \ ^ ^ _/nv 1 (“) 1 


sin (a;) 


[g (2w) - g (0)1 du} = 


/(O) 


sin (nui) 
sin (a;) 


[g (2w) - g (0)] duj 


ib) 1 
< 


/(O) 


2 E I/ml I'm. 

m^h 

TTurriff (0) 


A' 


Wda,. 

sm (cj) 


(C.8) 


Identity (a) follows from the following results which can be proved by applying ParsevaVs identity on the 
triangular pulse centred at 0. 

2 


1 


sin (nw) 
sin (oj) 


da; = 1 
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Based upon the following inequality, g inherits the Holder property from /. 


\g (2a;) 


<7(0)1 = 


/(M-/(o) 

/(0)/(2a;) 


^ 1/(2a;)-/(0)1 

/( 0 ) to / 


Hence, (b) is implied by this inheritance property and Proposition 3.2.12 of [H] regarding that / is a (A')- 
Holder function for any 0 < A' < A. 

For any A G (0,1], choose e = e (A) >0 such that (A — e) > 0. Let A' = (A — e) + '!^1 {a>i}- 

Obviously A' < A and A' > 1 for any A > 1. Notice that \fm\ |m| is bounded, so, 

m^Z 



sin (no;) 
sin (a;) 


2 

|a;|^ duj 




Oi) 

< 


sin^ {noj) x 


(nuj) 


a; I dnuj = 


_X f sin^ (u) 


,2-A 


-/-du 


c{n ^+'^ 1 {AG( 0 , 1 ]} H{a>i}) . 


(C.9) 


Inequality (uq) follows from the fact that ^ < 1 for any a; G (O, f) and (oi) is given by simple 

integration techniques. Combination of (IC.7I) - (IC.9|) and concludes the proof. □ 


Proposition C.2. Let Sn = {1,... ,n}. Under the same notation and assumptions as Proposition 1C.H 
have 


~^n — T , Sn) 


< 


C (n "h{AG(o,i]} 


■n~^l 


{A>1} , 


we 


Proof. Using Widom’s theorem (Theorem 2.14, [S]) and one line of straightforward algebra, there is a matrix 
Dn G such that 



= 

-V^Vn -Tn{g)) VnVn + T {Dn, Sn) 

n 


n 


< 


\Dn 


-^uVn{L 
n 


-1 


S~H (</)) 


Ui is a bounded operator which is defined by Vn (r') = (r'n, ■ • ■ i 0, 0, • • •) for any v G Based upon 
Theorem 2.15 of [H] (first equation, p. 44), ||iA„|| 2_,,2 = o (n~^). Moreover, the special form of Vn, gives 


1 

n 


(YriVn) (Lpj 


S~H (<?)) 


T /r-1 


{L 


7n {g)) Vn 


Ultimately, combining the same tricks as inequality (a) in (1C.51) . identity (|C.4I) and inequality (|C.6I) yields 


-vJVn -Tn (g)) VnVr. 


<C'[n '^^l{Ae(o,i]} + ^ ^l{A>i}) 


for some constant C (A) > 0. We end the proof by using triangle inequality. 
Corollary C.l. There is a constant C > 0 depending on / such that 


'r(S„\{l,...,n}) 


1 

W) 


<C'(n ^^1{AG(0,1]} + ^1{A>1}) • 


□ 


(C.IO) 
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